Appearance
Data Loader
Overview
Data Loader is responsible for populating the OR platform's relational databases with the static and reference data that underpins all real-time operations. Before any live data can be ingested, transformed, or displayed, the platform needs a foundation of geographic, device, and configuration data — maps, road segments, traffic signal locations, public transport routes, user accounts, and baseline metrics. Data Loader builds this foundation.
The service runs on stack startup or when reference data updates are required. It connects to a secure AWS S3 bucket that serves as the landing zone for static files, fetches the configured datasets, processes them through source-specific loading pipelines, and populates the PostgreSQL database with correctly structured reference data. This data then feeds into every downstream service — from Data Transformer (which uses it for spatial tagging) to the frontend (which uses it for map rendering).
Data can arrive in the S3 landing zone through several paths: pushed from the client side using the AWS SDK with authenticated secure transfer, pulled from OR side via a custom ETL pipeline, or manually uploaded by the Optimal Reality support team on an ad-hoc basis.
Architecture
- Port:
:2000 - Language: Julia
- Scaling: Singleton
Key Components
- Configuration-driven loading — A
config.yamlfile defines which datasets to load, their S3 locations, fetch methods, and load order. The config is read on startup and can also be passed as a dictionary to the running server's execute endpoint. - Fetch + Load pipeline — Each data source follows a two-step process:
- Fetch: Download data from S3 (or other source) and save locally on disk or in memory
- Load: Insert the processed data into the relevant PostgreSQL table
- Dependency management — Load functions can declare dependencies on other load functions, ensuring correct execution order (e.g. segments must be loaded before OSM data).
- GraphQL integration — Requires an active Experiment Manager (GraphQL) connection for coordination and to verify database readiness.
Dependencies
The following services must be running for Data Loader to operate:
- Experiment Manager (GraphQL)
- PostgreSQL (primary)
- PostgreSQL (simulation)
- Redis
Data Flow
AWS S3 (static files landing zone)
↓ (fetch configured datasets)
Data Loader [:2000]
├── Parse config.yaml
├── Resolve dependencies between load functions
├── Fetch data from S3
└── Load into PostgreSQL tables
↓
PostgreSQL (Core Reference Schema)
↓
Experiment Manager → All downstream servicesData Sources
Data Loader manages the following reference datasets:
| Dataset | Description | Format | Used By |
|---|---|---|---|
| OSM Map | OpenStreetMap ways and nodes | .osm | Situational Awareness, Schedule Generation |
| Segments | Static segment polygons and metadata | .csv | Situational Awareness |
| Environment Objects | CCTV, LUMS, SCATS intersections, sensors | .csv | Situational Awareness |
| Users & Roles | Login credentials and personalisation data | .csv | Core platform |
| Baselines | Segment and way metric baselines | .csv | Situational Awareness, Schedule Generation |
| Paths | All possible paths within the area of interest | .csv | Traffic Model |
| Schedule Environment | Segment lane data, intersection mappings | .csv | Traffic Model |
| SCATS Detector Counts | Historical detector count values | .csv | Schedule Generation |
Configuration
The config.yaml specifies fetch and load behaviour for each data source:
yaml
init-functions:
data-archive:
s3-bucket-name: or-tfnsw-poc-data
s3-archive-folder: archive
postgres-table-names:
- envobj_actual
- event_actual
- segment_actual
- way_actual
- event
depends-on:
- loader-functions
loader-functions:
segment:
load-method: s3
download-parameters:
segment:
s3-bucket-name: or-dot-poc-data
s3-file-key: data_loader/segment.csv
osm:
load-method: s3
download-parameters:
osm:
s3-bucket-name: or-dot-poc-data
s3-file-key: data_loader/knox.osm
depends-on:
- segmentConfiguration Options
| Option | Description | Required |
|---|---|---|
load-method | How to fetch the data (e.g. s3, lightosm-placename, lightosm-radius) | Yes |
download-parameters | S3 bucket name and file key, or other fetch-method-specific params | Yes (if S3) |
overwrite-local-files | Skip download if a local copy exists | No |
depends-on | List of loader functions that must complete first | No |
Local Development
Running in the REPL
julia
using DataLoader
DataLoader.initialise_experiment_manager() # Connect to GraphQL
DataLoader.App.__init__() # Load config.yaml
DataLoader.App.execute_config_loader_functions() # Run all configured loadersLoading a Specific Function
julia
# Load only environment objects (still requires config.yaml definition)
DataLoader.App.execute_config_loader_functions(["envobj"])Loading OSM Data
julia
using DataLoader, LightOSM
g = graph_from_file("data/melb_cbd_1km.osm")
DataLoader.App.load_osm_network(g)Operational Notes
Environment Object Reloading
When environment object IDs change, the database tables must be updated to remove old versions. Because device data is cross-referenced across the envobj, device static, and device actual tables, deletions must happen in order — first from static and actual tables, then from the envobj table.
Base Map Updates
Map files are loaded in Dev and promoted through higher environments. Mismatched files between environments can cause issues. When loading a new map, the internal map table must also be updated so that the broader application picks up the change.
Related Services
- Data Transformer — Uses loaded reference data for spatial tagging and transformation
- Data Fusion — Depends on hex and segment data loaded by Data Loader
- Data Archiver — Archives data from tables that Data Loader initially populates
- Experiment Manager — Central coordination service (GraphQL on
:5100)
