Skip to content

Data Exporter

Platform Users — Engineers & Low-code Ops Users (ORA / Panel Builder) OR Platform ORA — AI Planning Interface Agent Workflows Plan Visualisation ADK Integration SDK UI — Frontend Shell FDK Architecture Low code Config-driven DDK Schema Definition Code Generator Generated Server MDK WEM DAL Experiment Manager Nexus Deployment Control Live Monitoring Registry Browser SCDK Source Control Pipeline Mgmt Azure DevOps deploys ↓ SDK API — GraphQL Federation Gateway Federation Gateway Component Resolvers Auth & Licensing Plugins: gql-autogeneration Migrator Helm KinD Boilerplate GenAI ··· Microservices — Domain IP Services Data Pipeline Core Platform Metrics & Analytics Spatial & Geo Simulation Event Detection Camera & Device Fire & Resource Opt. Satellite Modelling ↓ Nexus deploys Deployed OR Applications Rail Ops Dashboard Mine Mgmt Dashboard Port Ops Dashboard ··· FDK-built · DDK-backed · MDK-powered · deployed via Nexus ↑ Application Users — Operations Teams (shift managers, analysts, planners)

Overview

Data Exporter is the OR platform's data extraction and delivery service — responsible for exporting operational and analytical data from the platform's data warehouse and making it available to external consumers via secure download links. It serves as the primary interface between the platform's internal data stores and external analytics environments.

The service operates in two modes: an on-demand API-triggered export that generates data files for a requested time range, and a scheduled config-driven export that runs daily. Both modes extract data from the analytics warehouse, apply PII removal and data classification filters, write files to cloud storage, and expose download links through the platform's API.

Data Exporter is tightly coupled with Post Monitoring, which maintains the warehouse schemas that Data Exporter reads from. Together, they form the platform's historical data pipeline — Post Monitoring handles the ETL into the warehouse, while Data Exporter handles the extraction to external consumers.

Architecture

The service operates in two modes: on-demand exports triggered through the platform API, and scheduled automated exports that run daily. It integrates with the platform's data warehouse, cloud storage, and central orchestration service.

Key Components

  • On-demand Export — API-triggered export flow. Accepts a time range, generates a request ID, executes warehouse queries to create data files, and returns secure download URLs.
  • Config-driven Export — Scheduled export flow running daily. Reads export configuration, applies data classification rules, and exports tables to cloud storage.
  • PII Removal — Strips personally identifiable information during the export transformation step before data leaves the platform boundary.
  • Data Classification Engine — Filters exported columns and rows based on classification rules (no restrictions, mixed, do not export, handle with care).
  • Warehouse Export — Executes optimized warehouse queries to export results directly to cloud storage in efficient file formats.

Connections

DirectionServicePurpose
InCentral Orchestration ServiceTriggers on-demand export requests
InAutomated SchedulerTriggers scheduled daily exports
InData WarehouseSource data for exports
OutCloud StorageData file storage and secure URL generation
OutConfiguration StoreExport configuration (config-driven mode)
OutNotification ServiceSuccess/failure notifications for scheduled exports

Data Flow

On-demand Export

External requesters (such as analytics platforms) submit export requests with a time range through the platform API. The Data Exporter generates a request ID, queries the data warehouse with PII removal filters, writes the results to cloud storage, and tracks progress. Upon completion, it generates secure download URLs (valid for 1 day) grouped by table and returns them to the requester.

Config-driven Export

The scheduled export runs daily, reading export configuration and data classification rules. For each enabled table, it builds filtered queries, applies classification filters, exports data to date-partitioned cloud storage, and updates status tracking. After all tables complete, it sends success/failure notifications.

GraphQL Endpoints

requestData (Mutation)

Triggers the on-demand data export process.

ParameterTypeDescription
startTimeInteger (epoch)Start of the export time range
endTimeInteger (epoch)End of the export time range (max 1 day from start)
ReturnsStringRequest ID (UUID) for status tracking

requestStatus (Mutation)

Checks export progress and retrieves download links on completion.

ParameterTypeDescription
requestIdString (UUID)The Request ID from requestData
Returns
statusStringIn Progress, Completed, or Request has not been requested
progressFloatCompletion percentage (0.0–100.0)
dataRequestDownloadLinksJSON{ table_name: [url, url, ...], ... } — presigned URLs grouped by table

TIP

Presigned URLs are valid for 1 day only. Files are grouped by table name (e.g. { way: ["..."], cctv: ["..."] }).

Data Classification

The config-driven export uses a multi-layered classification system to control what data is exported:

Classification Types

CodeMeaning
nrNo restrictions on exporting
mxMixed — depends on source type
deDo not export
hwcHandle with care — export with restrictions

Classification Tables

TablePurpose
data_column_classificationPer-column export rules for each database table
actual_data_classificationsPer-source export rules (e.g. FUSION = nr, HERE = de, TOMTOM = de)
long_table_data_classificationsPer-metric-name export rules for long-format tables
export_tablesMaster table controlling which tables are enabled for export

Export Status Tracking

Each table export is tracked in table_export_job_statuses:

StatusMeaning
IN-PROGRESSExport query submitted to warehouse
COMPLETEDExport finished successfully
FAILEDExport failed after all retry attempts

S3 Storage Structure

Exported Parquet files are stored in a date-partitioned structure:

s3://{bucket}/{env}/data_exporter_service/
  └── {table_name}/
      └── YYYYMMDD/
          └── {dt_start}_{dt_end}/
              ├── {table_name}_000.parquet
              ├── {table_name}_001.parquet
              └── manifest

Config-driven exports use:

s3://{bucket}/{env}/
  └── YYYY/MM/DD/
      ├── {table_name}/
      │   └── *.parquet
      └── classifications/
          └── *.parquet

Exploratory Data Analysis (EDA)

Data Exporter supports EDA workflows through query service integration. The exported data is cataloged and made queryable for ad-hoc analysis without impacting the production warehouse. This enables analysts to explore historical data using standard query languages while the export process continues to serve regular scheduled extractions.

Known Constraints

  • String length limits — The data warehouse has maximum character length constraints. Large geospatial fields are optimized through coordinate rounding and compression.
  • Duration limit — On-demand exports cannot span more than 1 day.
  • Concurrency — Export operations are limited to prevent warehouse overload during peak usage.
  • Retry behaviour — Failed export operations are retried a configured number of times before being marked as failed.

Troubleshooting

Request Progress Not Advancing

Cause: Data Exporter pod terminated due to too many concurrent requests, preventing export operations from completing.

Resolution:

  1. Retrieve the troubled Request ID from the export request
  2. Contact your platform administrator to clear the stalled request from the system
  3. Platform administrator will restart the Data Exporter service
  4. Resubmit the export request with the same start and end time range

TIP

Platform administrators have access to diagnostic tools for clearing stalled export requests. If you experience repeated stalls, contact support to investigate concurrency limits and resource allocation.

  • Post Monitoring — Upstream ETL service that maintains the warehouse schemas Data Exporter reads from
  • Batch Ingestion — Manages reference data that Data Exporter may export; responsible for compressed field generation
  • Data Archiver — Manages PostgreSQL data lifecycle; data must be exported before archival
  • Data Recorder — Writes the live data snapshots that eventually flow through Post Monitoring to Data Exporter
  • Experiment Manager — Central coordination service providing the API for on-demand export triggers
  • Baseline Manager — Generates baseline data that may be included in exports

User documentation for Optimal Reality