Skip to content

Data Recorder

Platform Users — Engineers & Low-code Ops Users (ORA / Panel Builder) OR Platform ORA — AI Planning Interface Agent Workflows Plan Visualisation ADK Integration SDK UI — Frontend Shell FDK Architecture Low code Config-driven DDK Schema Definition Code Generator Generated Server MDK WEM DAL Experiment Manager Nexus Deployment Control Live Monitoring Registry Browser SCDK Source Control Pipeline Mgmt Azure DevOps deploys ↓ SDK API — GraphQL Federation Gateway Federation Gateway Component Resolvers Auth & Licensing Plugins: gql-autogeneration Migrator Helm KinD Boilerplate GenAI ··· Microservices — Domain IP Services Data Pipeline Core Platform Metrics & Analytics Spatial & Geo Simulation Event Detection Camera & Device Fire & Resource Opt. Satellite Modelling ↓ Nexus deploys Deployed OR Applications Rail Ops Dashboard Mine Mgmt Dashboard Port Ops Dashboard ··· FDK-built · DDK-backed · MDK-powered · deployed via Nexus ↑ Application Users — Operations Teams (shift managers, analysts, planners)

Overview

Data Recorder bridges the gap between the OR platform's real-time data layer and its historical data stores. Live operational data in the platform is cached in Redis with short TTLs — ranging from a minute to a couple of hours — which means it would be lost without an active mechanism to persist it. Data Recorder fills this role by periodically reading live data from Redis and writing it to PostgreSQL, ensuring that operational data is preserved for historical analysis, reporting, and validation.

The service listens for session events from the Experiment Manager (on a configurable interval, typically every 5 minutes) and uses each session tick as a trigger to snapshot multiple data types simultaneously. For each data type, it queries the live data from Redis via GraphQL, formats the results into DataFrames, and writes them to the corresponding actuals tables in PostgreSQL.

Within the data pipeline lifecycle, Data Recorder sits downstream of Data Fusion — it consumes the fused, time-batched live data that Fusion has written to Redis, and persists it for longer-term use by the Data Archiver and other analytical services.

Architecture

  • Port: :6500
  • Language: Julia
  • Scaling: Singleton

Key Components

  • Session-triggered recording — Subscribes to session updates from the Experiment Manager via GraphQL subscription. Each session tick (e.g. every 5 minutes) triggers the recording cycle.
  • Multi-threaded data collection — On each trigger, multiple threads simultaneously collect and save different types of live data, maximising throughput within the recording window.
  • Prometheus metrics — Tracks recording duration and performance metrics, exposed via the HTTP server for monitoring.
  • HTTP health server — Thread 1 runs an HTTP server for Kubernetes health checks and Prometheus metric scraping.

Data Flow

Session Manager → Experiment Manager (session tick every 5 min)
        ↓ (GraphQL subscription)
Data Recorder [:6500]
  ├── Thread 2: Subscribe to session events
  └── Threads 3+: For each data type:
        ├── Query live data from Redis (via GraphQL)
        ├── Format into DataFrame
        ├── Write to PostgreSQL actuals table
        └── Post recording notice to Redis

Sequence of Operations

  1. Pod deploys — Thread 1 initialises and starts the HTTP server
  2. Subscription starts — Thread 2 subscribes to session events via GraphQL
  3. Session tick received — For each data type, a new thread:
    • Queries live data from Redis through GraphQL
    • Formats the response into a structured DataFrame
    • Writes the DataFrame to the appropriate PostgreSQL actuals table
    • Posts a recording completion notice back to Redis
    • Stores internal performance metrics

Recorded Data Types

Data Recorder captures snapshots of all live data types maintained by Data Fusion and other real-time services, including:

  • Way actuals (road speed, volume, flow, travel time)
  • Environment object actuals (device states, signal status)
  • Event data (incidents, closures, planned works)
  • Congestion tail data
  • Public transport data
  • Data Fusion — Upstream source of fused live data in Redis
  • Data Archiver — Archives recorded PostgreSQL data to S3 for long-term storage
  • Experiment Manager — Central coordination service (GraphQL on :5100); provides session triggers and data queries

User documentation for Optimal Reality