Appearance
Data Archiver
Overview
Data Archiver manages the final stage of the OR platform's data lifecycle — moving aged operational data out of PostgreSQL and into long-term cold storage on S3. As the Data Recorder continuously writes live data snapshots to PostgreSQL, the database would grow indefinitely without active management. Data Archiver prevents this buildup by identifying data older than a configured threshold, exporting it as CSV files to S3, and then removing it from the database.
This autonomous lifecycle management ensures that PostgreSQL queries remain fast and responsive for the real-time operations that depend on them, while no data is lost — it is simply moved to a more cost-effective storage tier. The archived CSV files on S3 remain available for historical analysis, compliance, and audit purposes.
Data Archiver operates independently of most other microservices, requiring only the Experiment Manager (GraphQL) for session triggers and database access.
Architecture
The service operates as a dedicated archival system for managing the platform's historical data lifecycle.
Key Components
- Session-triggered archiving — Subscribes to session updates from the central orchestration service. Archiving runs on a 24-hour session cycle.
- Configurable age threshold — Data older than a specified duration (e.g. one week) is identified for archiving.
- Cloud storage export — Aged data is exported and stored in cloud storage before being removed from the primary database.
- Table-by-table processing — Iterates through all configured data tables, processing each independently.
Connections
| Direction | Service | Purpose |
|---|---|---|
| In | Central Orchestration Service | Session subscription triggers |
| Out | Central Orchestration Service | Queries and completion notices |
| Out | Primary Database | Query and delete aged data |
| Out | Cloud Storage | Store archived data files |
Data Flow
The Session Manager triggers the Data Archiver on a 24-hour cycle through the central orchestration service. For each configured data table, the archiver queries for data older than the threshold, exports it to cloud storage, and removes it from the primary database. Upon completion, it posts a notice to the orchestration service to trigger downstream processes like baseline creation.
Sequence of Operations
- Service starts — Initializes health monitoring and session subscription
- Subscription begins — Subscribes to session updates from configuration
- Session tick (24 hours) — For each configured table:
- Query database for aged data
- Export data and upload to cloud storage
- Remove archived data from primary database
- Completion notice — Posts archive completion (triggers downstream processes like baseline creation)
Archived Data Types
Data Archiver processes the following operational data:
| Data Type | Content |
|---|---|
| Segment metrics | Segment-level metric snapshots |
| Way metrics | Way-level speed, volume, flow snapshots |
| Events | Incident and event records |
| Camera status | Camera status snapshots |
| Intersection signals | Traffic signal data |
| Intersection groups | Grouped intersection data |
| Sensor readings | Sensor measurement data |
| Sign displays | Variable message sign display states |
Configuration
Archiving behaviour is controlled through configuration:
- Cloud storage destination — Target location for archived data files
- Session subscription — Which session to listen to for triggering the archive cycle
- Age threshold — How old data must be before it qualifies for archiving
Related Services
- Data Recorder — Upstream service that writes the live data snapshots that Data Archiver eventually archives
- Data Loader — Loads the reference data for tables that Data Archiver manages
- Experiment Manager — Central coordination service providing session triggers and completion signalling
