Data Archiver

Overview

Data Archiver manages the final stage of the OR platform's data lifecycle — moving aged operational data out of PostgreSQL and into long-term cold storage on S3. As the Data Recorder continuously writes live data snapshots to PostgreSQL, the database would grow indefinitely without active management. Data Archiver prevents this buildup by identifying data older than a configured threshold, exporting it as CSV files to S3, and then removing it from the database.

This autonomous lifecycle management ensures that PostgreSQL queries remain fast and responsive for the real-time operations that depend on them, while no data is lost — it is simply moved to a more cost-effective storage tier. The archived CSV files on S3 remain available for historical analysis, compliance, and audit purposes.

Data Archiver operates independently of most other microservices, requiring only the Experiment Manager (GraphQL) for session triggers and database access.

Architecture

Port: :9500
Language: Julia
Scaling: Singleton

Key Components

Session-triggered archiving — Subscribes to session updates from the Experiment Manager via GraphQL. Archiving runs on a 24-hour session cycle.
Configurable age threshold — Data older than a specified duration (e.g. one week) is identified for archiving.
CSV export to S3 — Aged data is exported as CSV files and stored in a configured S3 bucket before being removed from PostgreSQL.
Table-by-table processing — Iterates through all configured actuals tables, processing each independently.

Connections

Direction	Service	Purpose
In	Experiment Manager (GraphQL)	Session subscription triggers
Out	Experiment Manager (GraphQL)	Queries and completion notices
Out	PostgreSQL (RDS)	Query and delete aged data
Out	AWS S3	Store archived CSV files

Data Flow

Session Manager → Experiment Manager (24-hour session tick)
        ↓ (GraphQL subscription)
Data Archiver [:9500]
  ├── Thread 1: HTTP server (health checks, metrics)
  └── Thread 2: Subscription controller
        ↓ (on session trigger)
        For each actuals table:
          ├── Query PostgreSQL for data older than threshold
          ├── Export to CSV
          ├── Upload CSV to S3
          └── Delete archived rows from PostgreSQL
        ↓
        Post completion notice to GraphQL

Sequence of Operations

Pod deploys — Thread 1 initialises and starts the HTTP server
Subscription starts — Thread 2 subscribes to session details from config.yaml
Session tick (24 hours) — For each configured table:
- Preprocess and query RDS for aged data
- Export data as CSV and upload to S3
- Remove archived data from PostgreSQL
Completion notice — Posts archive completion to GraphQL (triggers downstream processes like baseline creation)

Archived Tables

Data Archiver processes the following PostgreSQL actuals tables:

Table	Content
`segment_actual`	Segment-level metric snapshots
`way_actual`	Way-level speed, volume, flow snapshots
`event`	Incident and event records
`env_cctv_actual`	CCTV camera status snapshots
`env_intersection_actual`	SCATS intersection signal data
`env_intersection_group_actual`	Grouped intersection data
`env_sensor_actual`	Sensor readings
`env_sign_actual`	VMS and sign display states

Configuration

Archiving behaviour is controlled through config.yaml:

S3 bucket name — Target bucket for archived CSV files
Session subscription — Which session to listen to for triggering the archive cycle
Age threshold — How old data must be before it qualifies for archiving

Data Recorder — Upstream service that writes the live data snapshots that Data Archiver eventually archives
Data Loader — Loads the reference data for tables that Data Archiver manages
Experiment Manager — Central coordination service (GraphQL on :5100); provides session triggers and completion signalling

Creating a Data Schema

Building and Configuring Workflows

DDK (Data)

MDK (Modelling)

Modelling Library

FDK (Frontend)

Nexus (Deployment)

Data Archiver

Overview

Architecture

Key Components

Connections

Data Flow

Sequence of Operations

Archived Tables

Configuration

Modelling Library

Data Archiver ​

Overview ​

Architecture ​

Key Components ​

Connections ​

Data Flow ​

Sequence of Operations ​

Archived Tables ​

Configuration ​

Related Services ​

Data Archiver

Overview

Architecture

Key Components

Connections

Data Flow

Sequence of Operations

Archived Tables

Configuration

Related Services