Skip to content

Data Loader

Platform Users — Engineers & Low-code Ops Users (ORA / Panel Builder) OR Platform ORA — AI Planning Interface Agent Workflows Plan Visualisation ADK Integration SDK UI — Frontend Shell FDK Architecture Low code Config-driven DDK Schema Definition Code Generator Generated Server MDK WEM DAL Experiment Manager Nexus Deployment Control Live Monitoring Registry Browser SCDK Source Control Pipeline Mgmt Azure DevOps deploys ↓ SDK API — GraphQL Federation Gateway Federation Gateway Component Resolvers Auth & Licensing Plugins: gql-autogeneration Migrator Helm KinD Boilerplate GenAI ··· Microservices — Domain IP Services Data Pipeline Core Platform Metrics & Analytics Spatial & Geo Simulation Event Detection Camera & Device Fire & Resource Opt. Satellite Modelling ↓ Nexus deploys Deployed OR Applications Rail Ops Dashboard Mine Mgmt Dashboard Port Ops Dashboard ··· FDK-built · DDK-backed · MDK-powered · deployed via Nexus ↑ Application Users — Operations Teams (shift managers, analysts, planners)

Overview

Data Loader is responsible for populating the OR platform's relational databases with the static and reference data that underpins all real-time operations. Before any live data can be ingested, transformed, or displayed, the platform needs a foundation of geographic, device, and configuration data — maps, road segments, traffic signal locations, public transport routes, user accounts, and baseline metrics. Data Loader builds this foundation.

The service runs on stack startup or when reference data updates are required. It connects to a secure AWS S3 bucket that serves as the landing zone for static files, fetches the configured datasets, processes them through source-specific loading pipelines, and populates the PostgreSQL database with correctly structured reference data. This data then feeds into every downstream service — from Data Transformer (which uses it for spatial tagging) to the frontend (which uses it for map rendering).

Data can arrive in the S3 landing zone through several paths: pushed from the client side using the AWS SDK with authenticated secure transfer, pulled from OR side via a custom ETL pipeline, or manually uploaded by the Optimal Reality support team on an ad-hoc basis.

Architecture

  • Port: :2000
  • Language: Julia
  • Scaling: Singleton

Key Components

  • Configuration-driven loading — A config.yaml file defines which datasets to load, their S3 locations, fetch methods, and load order. The config is read on startup and can also be passed as a dictionary to the running server's execute endpoint.
  • Fetch + Load pipeline — Each data source follows a two-step process:
    1. Fetch: Download data from S3 (or other source) and save locally on disk or in memory
    2. Load: Insert the processed data into the relevant PostgreSQL table
  • Dependency management — Load functions can declare dependencies on other load functions, ensuring correct execution order (e.g. segments must be loaded before OSM data).
  • GraphQL integration — Requires an active Experiment Manager (GraphQL) connection for coordination and to verify database readiness.

Dependencies

The following services must be running for Data Loader to operate:

  • Experiment Manager (GraphQL)
  • PostgreSQL (primary)
  • PostgreSQL (simulation)
  • Redis

Data Flow

AWS S3 (static files landing zone)
        ↓ (fetch configured datasets)
Data Loader [:2000]
  ├── Parse config.yaml
  ├── Resolve dependencies between load functions
  ├── Fetch data from S3
  └── Load into PostgreSQL tables

PostgreSQL (Core Reference Schema)

Experiment Manager → All downstream services

Data Sources

Data Loader manages the following reference datasets:

DatasetDescriptionFormatUsed By
OSM MapOpenStreetMap ways and nodes.osmSituational Awareness, Schedule Generation
SegmentsStatic segment polygons and metadata.csvSituational Awareness
Environment ObjectsCCTV, LUMS, SCATS intersections, sensors.csvSituational Awareness
Users & RolesLogin credentials and personalisation data.csvCore platform
BaselinesSegment and way metric baselines.csvSituational Awareness, Schedule Generation
PathsAll possible paths within the area of interest.csvTraffic Model
Schedule EnvironmentSegment lane data, intersection mappings.csvTraffic Model
SCATS Detector CountsHistorical detector count values.csvSchedule Generation

Configuration

The config.yaml specifies fetch and load behaviour for each data source:

yaml
init-functions:
  data-archive:
    s3-bucket-name: or-tfnsw-poc-data
    s3-archive-folder: archive
    postgres-table-names:
      - envobj_actual
      - event_actual
      - segment_actual
      - way_actual
      - event
    depends-on:
      - loader-functions

loader-functions:
  segment:
    load-method: s3
    download-parameters:
      segment:
        s3-bucket-name: or-dot-poc-data
        s3-file-key: data_loader/segment.csv
  osm:
    load-method: s3
    download-parameters:
      osm:
        s3-bucket-name: or-dot-poc-data
        s3-file-key: data_loader/knox.osm
    depends-on:
      - segment

Configuration Options

OptionDescriptionRequired
load-methodHow to fetch the data (e.g. s3, lightosm-placename, lightosm-radius)Yes
download-parametersS3 bucket name and file key, or other fetch-method-specific paramsYes (if S3)
overwrite-local-filesSkip download if a local copy existsNo
depends-onList of loader functions that must complete firstNo

Local Development

Running in the REPL

julia
using DataLoader
DataLoader.initialise_experiment_manager()  # Connect to GraphQL
DataLoader.App.__init__()                  # Load config.yaml
DataLoader.App.execute_config_loader_functions()  # Run all configured loaders

Loading a Specific Function

julia
# Load only environment objects (still requires config.yaml definition)
DataLoader.App.execute_config_loader_functions(["envobj"])

Loading OSM Data

julia
using DataLoader, LightOSM
g = graph_from_file("data/melb_cbd_1km.osm")
DataLoader.App.load_osm_network(g)

Operational Notes

Environment Object Reloading

When environment object IDs change, the database tables must be updated to remove old versions. Because device data is cross-referenced across the envobj, device static, and device actual tables, deletions must happen in order — first from static and actual tables, then from the envobj table.

Base Map Updates

Map files are loaded in Dev and promoted through higher environments. Mismatched files between environments can cause issues. When loading a new map, the internal map table must also be updated so that the broader application picks up the change.

  • Data Transformer — Uses loaded reference data for spatial tagging and transformation
  • Data Fusion — Depends on hex and segment data loaded by Data Loader
  • Data Archiver — Archives data from tables that Data Loader initially populates
  • Experiment Manager — Central coordination service (GraphQL on :5100)

User documentation for Optimal Reality