Data Recorder

Overview

Data Recorder bridges the gap between the OR platform's real-time data layer and its historical data stores. Live operational data in the platform is cached in Redis with short TTLs — ranging from a minute to a couple of hours — which means it would be lost without an active mechanism to persist it. Data Recorder fills this role by periodically reading live data from Redis and writing it to PostgreSQL, ensuring that operational data is preserved for historical analysis, reporting, and validation.

The service listens for session events from the Experiment Manager (on a configurable interval, typically every 5 minutes) and uses each session tick as a trigger to snapshot multiple data types simultaneously. For each data type, it queries the live data from Redis via GraphQL, formats the results into DataFrames, and writes them to the corresponding actuals tables in PostgreSQL.

Within the data pipeline lifecycle, Data Recorder sits downstream of Data Fusion — it consumes the fused, time-batched live data that Fusion has written to Redis, and persists it for longer-term use by the Data Archiver and other analytical services.

Architecture

Port: :6500
Language: Julia
Scaling: Singleton

Key Components

Session-triggered recording — Subscribes to session updates from the Experiment Manager via GraphQL subscription. Each session tick (e.g. every 5 minutes) triggers the recording cycle.
Multi-threaded data collection — On each trigger, multiple threads simultaneously collect and save different types of live data, maximising throughput within the recording window.
Prometheus metrics — Tracks recording duration and performance metrics, exposed via the HTTP server for monitoring.
HTTP health server — Thread 1 runs an HTTP server for Kubernetes health checks and Prometheus metric scraping.

Data Flow

Session Manager → Experiment Manager (session tick every 5 min)
        ↓ (GraphQL subscription)
Data Recorder [:6500]
  ├── Thread 2: Subscribe to session events
  └── Threads 3+: For each data type:
        ├── Query live data from Redis (via GraphQL)
        ├── Format into DataFrame
        ├── Write to PostgreSQL actuals table
        └── Post recording notice to Redis

Sequence of Operations

Pod deploys — Thread 1 initialises and starts the HTTP server
Subscription starts — Thread 2 subscribes to session events via GraphQL
Session tick received — For each data type, a new thread:
- Queries live data from Redis through GraphQL
- Formats the response into a structured DataFrame
- Writes the DataFrame to the appropriate PostgreSQL actuals table
- Posts a recording completion notice back to Redis
- Stores internal performance metrics

Recorded Data Types

Data Recorder captures snapshots of all live data types maintained by Data Fusion and other real-time services, including:

Way actuals (road speed, volume, flow, travel time)
Environment object actuals (device states, signal status)
Event data (incidents, closures, planned works)
Congestion tail data
Public transport data

Data Fusion — Upstream source of fused live data in Redis
Data Archiver — Archives recorded PostgreSQL data to S3 for long-term storage
Experiment Manager — Central coordination service (GraphQL on :5100); provides session triggers and data queries

Creating a Data Schema

Building and Configuring Workflows

DDK (Data)

MDK (Modelling)

Modelling Library

FDK (Frontend)

Nexus (Deployment)

Data Recorder

Overview

Architecture

Key Components

Data Flow

Sequence of Operations

Recorded Data Types

Modelling Library

Data Recorder ​

Overview ​

Architecture ​

Key Components ​

Data Flow ​

Sequence of Operations ​

Recorded Data Types ​

Related Services ​

Data Recorder

Overview

Architecture

Key Components

Data Flow

Sequence of Operations

Recorded Data Types

Related Services