Appearance
Data Loader
Overview
Data Loader is responsible for populating the OR platform's reference databases with the static and reference data that underpins all real-time operations. Before any live data can be ingested, transformed, or displayed, the platform needs a foundation of geographic, device, and configuration data — maps, road segments, traffic signal locations, public transport routes, user accounts, and baseline metrics. Data Loader builds this foundation.
The service runs on stack startup or when reference data updates are required. It connects to secure cloud storage that serves as the landing zone for static files, fetches the configured datasets, processes them through source-specific loading pipelines, and populates the reference database with correctly structured data. This data then feeds into every downstream service — from Data Transformer (which uses it for spatial tagging) to the frontend (which uses it for map rendering).
Data can arrive in the cloud storage landing zone through several paths: pushed from the client side with authenticated secure transfer, pulled from OR side via a custom data pipeline, or manually uploaded by the Optimal Reality support team on an ad-hoc basis.
Architecture
The Data Loader service operates as a dedicated data management system that coordinates the loading of reference datasets into the platform's databases. It follows a configuration-driven approach that enables flexible management of data sources and loading sequences.
Key Capabilities
- Configuration-driven loading — A configuration file defines which datasets to load, their cloud storage locations, fetch methods, and load order. The configuration is read on startup and can be updated to modify loading behavior without code changes.
- Fetch and Load pipeline — Each data source follows a two-step process:
- Fetch: Download data from cloud storage or other sources and prepare it for processing
- Load: Insert the processed data into the relevant database tables
- Dependency management — Load functions can declare dependencies on other load functions, ensuring correct execution order (e.g. segments must be loaded before map data).
- Platform integration — Integrates with the central orchestration service for coordination and to verify database readiness.
Dependencies
The service requires several platform components to be operational:
- Central orchestration service
- Primary reference database
- Simulation database
- Data cache
Data Flow
Cloud storage serves as the landing zone for static files. The Data Loader service fetches configured datasets, resolves dependencies between load functions, and populates database tables with reference data. This reference data then becomes available to all downstream services through the central orchestration layer.
Data Sources
Data Loader manages the following reference datasets:
| Dataset | Description | Used By |
|---|---|---|
| Map Data | Road network structure and nodes | Situational Awareness, Schedule Generation |
| Segments | Static segment polygons and metadata | Situational Awareness |
| Environment Objects | Camera locations, traffic signals, sensors | Situational Awareness |
| Users and Roles | Login credentials and personalisation data | Core platform |
| Baselines | Segment and way metric baselines | Situational Awareness, Schedule Generation |
| Paths | All possible paths within the area of interest | Traffic Model |
| Schedule Environment | Segment lane data, intersection mappings | Traffic Model |
| Detector Counts | Historical detector count values | Schedule Generation |
Configuration
Data Loader uses a configuration-driven approach where each data source specifies:
- Source location — Where to fetch data (cloud storage, APIs, or other sources)
- Load method — How to retrieve the data (direct download, API query, or dynamic generation)
- Target tables — Which database tables receive the loaded data
- Dependencies — Load order requirements to ensure referential integrity (e.g., segments must be loaded before map data)
- Download parameters — Source-specific settings such as locations or query parameters
This configuration structure enables operational teams to manage data source definitions without code changes, making it straightforward to add new reference datasets or update existing ones as requirements evolve. Each data source can declare dependencies on other sources, ensuring the system loads data in the correct order to maintain database integrity.
Using Data Loader
Data Loader can be triggered through the platform's operational workflows for initial stack deployment or when reference data updates are required. The service automatically:
- Loads configuration — Reads data source definitions and determines the correct load order based on dependencies
- Fetches datasets — Downloads configured datasets from cloud storage or other sources
- Populates database — Loads data into database tables in dependency order
- Validates completion — Verifies that all required datasets have been loaded successfully
The service supports loading all configured datasets in a single operation, or loading specific datasets individually when needed. This flexibility enables both complete stack initialization and incremental updates to individual reference data sources.
For operational guidance on triggering data loads or managing data source configurations, consult your platform administrator or support team.
Operational Notes
Environment Object Reloading
When environment object IDs change, the database tables must be updated to remove old versions. Because device data is cross-referenced across the envobj, device static, and device actual tables, deletions must happen in order — first from static and actual tables, then from the envobj table.
Base Map Updates
Map files are loaded in Dev and promoted through higher environments. Mismatched files between environments can cause issues. When loading a new map, the internal map table must also be updated so that the broader application picks up the change.
Related Services
- Data Transformer — Uses loaded reference data for spatial tagging and transformation
- Data Fusion — Depends on hex and segment data loaded by Data Loader
- Data Archiver — Archives data from tables that Data Loader initially populates
- Experiment Manager — Central orchestration service
