Appearance
Data Exporter
Overview
Data Exporter is the OR platform's data extraction and delivery service — responsible for exporting operational and analytical data from the platform's data warehouse and making it available to external consumers via secure download links. It serves as the primary interface between the platform's internal data stores and external analytics environments.
The service operates in two modes: an on-demand API-triggered export that generates data files for a requested time range, and a scheduled config-driven export that runs daily. Both modes extract data from the analytics warehouse, apply PII removal and data classification filters, write files to cloud storage, and expose download links through the platform's API.
Data Exporter is tightly coupled with Post Monitoring, which maintains the warehouse schemas that Data Exporter reads from. Together, they form the platform's historical data pipeline — Post Monitoring handles the ETL into the warehouse, while Data Exporter handles the extraction to external consumers.
Architecture
The service operates in two modes: on-demand exports triggered through the platform API, and scheduled automated exports that run daily. It integrates with the platform's data warehouse, cloud storage, and central orchestration service.
Key Components
- On-demand Export — API-triggered export flow. Accepts a time range, generates a request ID, executes warehouse queries to create data files, and returns secure download URLs.
- Config-driven Export — Scheduled export flow running daily. Reads export configuration, applies data classification rules, and exports tables to cloud storage.
- PII Removal — Strips personally identifiable information during the export transformation step before data leaves the platform boundary.
- Data Classification Engine — Filters exported columns and rows based on classification rules (no restrictions, mixed, do not export, handle with care).
- Warehouse Export — Executes optimized warehouse queries to export results directly to cloud storage in efficient file formats.
Connections
| Direction | Service | Purpose |
|---|---|---|
| In | Central Orchestration Service | Triggers on-demand export requests |
| In | Automated Scheduler | Triggers scheduled daily exports |
| In | Data Warehouse | Source data for exports |
| Out | Cloud Storage | Data file storage and secure URL generation |
| Out | Configuration Store | Export configuration (config-driven mode) |
| Out | Notification Service | Success/failure notifications for scheduled exports |
Data Flow
On-demand Export
External requesters (such as analytics platforms) submit export requests with a time range through the platform API. The Data Exporter generates a request ID, queries the data warehouse with PII removal filters, writes the results to cloud storage, and tracks progress. Upon completion, it generates secure download URLs (valid for 1 day) grouped by table and returns them to the requester.
Config-driven Export
The scheduled export runs daily, reading export configuration and data classification rules. For each enabled table, it builds filtered queries, applies classification filters, exports data to date-partitioned cloud storage, and updates status tracking. After all tables complete, it sends success/failure notifications.
GraphQL Endpoints
requestData (Mutation)
Triggers the on-demand data export process.
| Parameter | Type | Description |
|---|---|---|
startTime | Integer (epoch) | Start of the export time range |
endTime | Integer (epoch) | End of the export time range (max 1 day from start) |
| Returns | String | Request ID (UUID) for status tracking |
requestStatus (Mutation)
Checks export progress and retrieves download links on completion.
| Parameter | Type | Description |
|---|---|---|
requestId | String (UUID) | The Request ID from requestData |
| Returns | ||
status | String | In Progress, Completed, or Request has not been requested |
progress | Float | Completion percentage (0.0–100.0) |
dataRequestDownloadLinks | JSON | { table_name: [url, url, ...], ... } — presigned URLs grouped by table |
TIP
Presigned URLs are valid for 1 day only. Files are grouped by table name (e.g. { way: ["..."], cctv: ["..."] }).
Data Classification
The config-driven export uses a multi-layered classification system to control what data is exported:
Classification Types
| Code | Meaning |
|---|---|
nr | No restrictions on exporting |
mx | Mixed — depends on source type |
de | Do not export |
hwc | Handle with care — export with restrictions |
Classification Tables
| Table | Purpose |
|---|---|
data_column_classification | Per-column export rules for each database table |
actual_data_classifications | Per-source export rules (e.g. FUSION = nr, HERE = de, TOMTOM = de) |
long_table_data_classifications | Per-metric-name export rules for long-format tables |
export_tables | Master table controlling which tables are enabled for export |
Export Status Tracking
Each table export is tracked in table_export_job_statuses:
| Status | Meaning |
|---|---|
IN-PROGRESS | Export query submitted to warehouse |
COMPLETED | Export finished successfully |
FAILED | Export failed after all retry attempts |
S3 Storage Structure
Exported Parquet files are stored in a date-partitioned structure:
s3://{bucket}/{env}/data_exporter_service/
└── {table_name}/
└── YYYYMMDD/
└── {dt_start}_{dt_end}/
├── {table_name}_000.parquet
├── {table_name}_001.parquet
└── manifestConfig-driven exports use:
s3://{bucket}/{env}/
└── YYYY/MM/DD/
├── {table_name}/
│ └── *.parquet
└── classifications/
└── *.parquetExploratory Data Analysis (EDA)
Data Exporter supports EDA workflows through query service integration. The exported data is cataloged and made queryable for ad-hoc analysis without impacting the production warehouse. This enables analysts to explore historical data using standard query languages while the export process continues to serve regular scheduled extractions.
Known Constraints
- String length limits — The data warehouse has maximum character length constraints. Large geospatial fields are optimized through coordinate rounding and compression.
- Duration limit — On-demand exports cannot span more than 1 day.
- Concurrency — Export operations are limited to prevent warehouse overload during peak usage.
- Retry behaviour — Failed export operations are retried a configured number of times before being marked as failed.
Troubleshooting
Request Progress Not Advancing
Cause: Data Exporter pod terminated due to too many concurrent requests, preventing export operations from completing.
Resolution:
- Retrieve the troubled Request ID from the export request
- Contact your platform administrator to clear the stalled request from the system
- Platform administrator will restart the Data Exporter service
- Resubmit the export request with the same start and end time range
TIP
Platform administrators have access to diagnostic tools for clearing stalled export requests. If you experience repeated stalls, contact support to investigate concurrency limits and resource allocation.
Related Services
- Post Monitoring — Upstream ETL service that maintains the warehouse schemas Data Exporter reads from
- Batch Ingestion — Manages reference data that Data Exporter may export; responsible for compressed field generation
- Data Archiver — Manages PostgreSQL data lifecycle; data must be exported before archival
- Data Recorder — Writes the live data snapshots that eventually flow through Post Monitoring to Data Exporter
- Experiment Manager — Central coordination service providing the API for on-demand export triggers
- Baseline Manager — Generates baseline data that may be included in exports
