Skip to content

Data Ingestion

Platform Users — Engineers & Low-code Ops Users (ORA / Panel Builder) OR Platform ORA — AI Planning Interface Agent Workflows Plan Visualisation ADK Integration SDK UI — Frontend Shell FDK Architecture Low code Config-driven DDK Schema Definition Code Generator Generated Server MDK WEM DAL Experiment Manager Nexus Deployment Control Live Monitoring Registry Browser SCDK Source Control Pipeline Mgmt Azure DevOps deploys ↓ SDK API — GraphQL Federation Gateway Federation Gateway Component Resolvers Auth & Licensing Plugins: gql-autogeneration Migrator Helm KinD Boilerplate GenAI ··· Microservices — Domain IP Services Data Pipeline Core Platform Metrics & Analytics Spatial & Geo Simulation Event Detection Camera & Device Fire & Resource Opt. Satellite Modelling ↓ Nexus deploys Deployed OR Applications Rail Ops Dashboard Mine Mgmt Dashboard Port Ops Dashboard ··· FDK-built · DDK-backed · MDK-powered · deployed via Nexus ↑ Application Users — Operations Teams (shift managers, analysts, planners)

Overview

The Data Ingestion service is the primary entry point for polling-based external data into the OR platform. It collects data from HTTP, FTP, and SQS sources — including client internal feeds (via the ESB), third-party APIs, and government open data endpoints — and forwards it to the Data Transformer for normalisation into OR-compliant formats.

Each external API endpoint has its own schema, authentication method, and update frequency. Rather than building bespoke integrations per source, Data Ingestion provides a configuration-driven ingestion framework that handles the polling lifecycle, credential management, and delivery to the transformation layer. This pattern keeps complexity low and latency minimal, while making it straightforward to add or remove data sources as operational requirements evolve.

Within the broader data pipeline, Data Ingestion sits at the earliest stage — upstream of Data Transformer, Data Fusion, and all downstream consumers. It operates alongside Data Stream Ingestion (which handles event-driven message sources) and Data Redis Ingestion (which handles real-time computer vision feeds), collectively forming the platform's ingestion layer.

Architecture

The Data Ingestion service operates as a configuration-driven polling engine that supports multiple data source types and protocols. It scales horizontally to handle high-frequency data collection across multiple tenants while maintaining efficient resource utilization.

Key Capabilities

  • Configuration-driven polling — Each data source is defined through configuration, specifying how to connect, authenticate, and poll the source. This approach enables rapid addition or removal of data sources without code changes.
  • Coordinated scheduling — The service manages polling frequencies across distributed instances, ensuring data sources are queried at appropriate intervals without duplication or conflicts.
  • Reliable forwarding — Collected data is immediately forwarded to transformation services for normalization and enrichment before entering the platform's data pipeline.
  • Health monitoring — The service provides health check endpoints for deployment orchestration and monitoring systems.

Data Flow

External data sources are polled at configured intervals, and the collected data flows through a transformation layer that normalizes it into platform-standard formats before distribution to downstream consumers.

Configuration

The service uses a configuration-driven approach where each data source is defined with its connection details, authentication requirements, and polling frequency. This enables operational teams to manage data sources through configuration updates rather than code deployments, reducing time-to-integrate new data feeds and simplifying maintenance.

Ingested Sources

The service currently polls a wide range of client and third-party sources:

SourceProtocolFrequency
AddInsight LinksHTTPS30s
STREAMS Vehicle DetectorsHTTPS5 min
RTDMSHTTPS60s
SITREP (Road Closures)HTTPS30s
LUMSHTTPS10 min
VSLSHTTPS10 min
VMS / VMS CompositesHTTPS10 min
Metro Train PositionsHTTPS30s
Metro Train Trip UpdatesHTTPS30s
Metro Train Service AlertsHTTPS30s
PTV DisruptionsHTTPS5 min
BOM RainfallSFTP5 min
RAI JobsHTTPS5 min
IRS EyeFiHTTPS30s
ServiceNow IRSHTTPS30 min
Tow AllocationHTTPS60s
RWEHTTPS5 min
ETSHTTPS5 min
Ramp AHS / Metering / OperationsHTTPS5 min
Off RampsHTTPS5 min
ESLSHTTPS2 min
SCATS Site Status / PFLHTTPS1 hour
RID ImpactsSQSVariable (long poll)
OneViewHTTPS5 min

Message Queue Ingestion (RID)

RID publishes impact data through a message queue service. Unlike HTTP sources that poll on a fixed interval, message queue ingestion uses a long polling approach: the service requests data and waits for new messages, reducing both end-to-end latency and API call volume.

Long polling enables near-real-time data delivery by starting a new ingestion cycle immediately when a message is received rather than waiting for the next scheduled poll interval.

Platform logging tracks which RID impacts have been successfully processed, supporting debugging and audit workflows.

  • Data Transformer — Downstream consumer that normalises ingested data
  • Data Stream Ingestion — Sibling ingestion service for event-driven message streams
  • Data Redis Ingestion — Sibling ingestion service for real-time data sources
  • Data Fusion — Fuses transformed data from multiple sources
  • Experiment Manager — Central orchestration service

User documentation for Optimal Reality