Skip to content

Data Archiver

Platform Users — Engineers & Low-code Ops Users (ORA / Panel Builder) OR Platform ORA — AI Planning Interface Agent Workflows Plan Visualisation ADK Integration SDK UI — Frontend Shell FDK Architecture Low code Config-driven DDK Schema Definition Code Generator Generated Server MDK WEM DAL Experiment Manager Nexus Deployment Control Live Monitoring Registry Browser SCDK Source Control Pipeline Mgmt Azure DevOps deploys ↓ SDK API — GraphQL Federation Gateway Federation Gateway Component Resolvers Auth & Licensing Plugins: gql-autogeneration Migrator Helm KinD Boilerplate GenAI ··· Microservices — Domain IP Services Data Pipeline Core Platform Metrics & Analytics Spatial & Geo Simulation Event Detection Camera & Device Fire & Resource Opt. Satellite Modelling ↓ Nexus deploys Deployed OR Applications Rail Ops Dashboard Mine Mgmt Dashboard Port Ops Dashboard ··· FDK-built · DDK-backed · MDK-powered · deployed via Nexus ↑ Application Users — Operations Teams (shift managers, analysts, planners)

Overview

Data Archiver manages the final stage of the OR platform's data lifecycle — moving aged operational data out of PostgreSQL and into long-term cold storage on S3. As the Data Recorder continuously writes live data snapshots to PostgreSQL, the database would grow indefinitely without active management. Data Archiver prevents this buildup by identifying data older than a configured threshold, exporting it as CSV files to S3, and then removing it from the database.

This autonomous lifecycle management ensures that PostgreSQL queries remain fast and responsive for the real-time operations that depend on them, while no data is lost — it is simply moved to a more cost-effective storage tier. The archived CSV files on S3 remain available for historical analysis, compliance, and audit purposes.

Data Archiver operates independently of most other microservices, requiring only the Experiment Manager (GraphQL) for session triggers and database access.

Architecture

  • Port: :9500
  • Language: Julia
  • Scaling: Singleton

Key Components

  • Session-triggered archiving — Subscribes to session updates from the Experiment Manager via GraphQL. Archiving runs on a 24-hour session cycle.
  • Configurable age threshold — Data older than a specified duration (e.g. one week) is identified for archiving.
  • CSV export to S3 — Aged data is exported as CSV files and stored in a configured S3 bucket before being removed from PostgreSQL.
  • Table-by-table processing — Iterates through all configured actuals tables, processing each independently.

Connections

DirectionServicePurpose
InExperiment Manager (GraphQL)Session subscription triggers
OutExperiment Manager (GraphQL)Queries and completion notices
OutPostgreSQL (RDS)Query and delete aged data
OutAWS S3Store archived CSV files

Data Flow

Session Manager → Experiment Manager (24-hour session tick)
        ↓ (GraphQL subscription)
Data Archiver [:9500]
  ├── Thread 1: HTTP server (health checks, metrics)
  └── Thread 2: Subscription controller
        ↓ (on session trigger)
        For each actuals table:
          ├── Query PostgreSQL for data older than threshold
          ├── Export to CSV
          ├── Upload CSV to S3
          └── Delete archived rows from PostgreSQL

        Post completion notice to GraphQL

Sequence of Operations

  1. Pod deploys — Thread 1 initialises and starts the HTTP server
  2. Subscription starts — Thread 2 subscribes to session details from config.yaml
  3. Session tick (24 hours) — For each configured table:
    • Preprocess and query RDS for aged data
    • Export data as CSV and upload to S3
    • Remove archived data from PostgreSQL
  4. Completion notice — Posts archive completion to GraphQL (triggers downstream processes like baseline creation)

Archived Tables

Data Archiver processes the following PostgreSQL actuals tables:

TableContent
segment_actualSegment-level metric snapshots
way_actualWay-level speed, volume, flow snapshots
eventIncident and event records
env_cctv_actualCCTV camera status snapshots
env_intersection_actualSCATS intersection signal data
env_intersection_group_actualGrouped intersection data
env_sensor_actualSensor readings
env_sign_actualVMS and sign display states

Configuration

Archiving behaviour is controlled through config.yaml:

  • S3 bucket name — Target bucket for archived CSV files
  • Session subscription — Which session to listen to for triggering the archive cycle
  • Age threshold — How old data must be before it qualifies for archiving
  • Data Recorder — Upstream service that writes the live data snapshots that Data Archiver eventually archives
  • Data Loader — Loads the reference data for tables that Data Archiver manages
  • Experiment Manager — Central coordination service (GraphQL on :5100); provides session triggers and completion signalling

User documentation for Optimal Reality