Data Loader

Overview

Data Loader is responsible for populating the OR platform's relational databases with the static and reference data that underpins all real-time operations. Before any live data can be ingested, transformed, or displayed, the platform needs a foundation of geographic, device, and configuration data — maps, road segments, traffic signal locations, public transport routes, user accounts, and baseline metrics. Data Loader builds this foundation.

The service runs on stack startup or when reference data updates are required. It connects to a secure AWS S3 bucket that serves as the landing zone for static files, fetches the configured datasets, processes them through source-specific loading pipelines, and populates the PostgreSQL database with correctly structured reference data. This data then feeds into every downstream service — from Data Transformer (which uses it for spatial tagging) to the frontend (which uses it for map rendering).

Data can arrive in the S3 landing zone through several paths: pushed from the client side using the AWS SDK with authenticated secure transfer, pulled from OR side via a custom ETL pipeline, or manually uploaded by the Optimal Reality support team on an ad-hoc basis.

Architecture

Port: :2000
Language: Julia
Scaling: Singleton

Key Components

Configuration-driven loading — A config.yaml file defines which datasets to load, their S3 locations, fetch methods, and load order. The config is read on startup and can also be passed as a dictionary to the running server's execute endpoint.
Fetch + Load pipeline — Each data source follows a two-step process:
1. Fetch: Download data from S3 (or other source) and save locally on disk or in memory
2. Load: Insert the processed data into the relevant PostgreSQL table
Dependency management — Load functions can declare dependencies on other load functions, ensuring correct execution order (e.g. segments must be loaded before OSM data).
GraphQL integration — Requires an active Experiment Manager (GraphQL) connection for coordination and to verify database readiness.

Dependencies

The following services must be running for Data Loader to operate:

Experiment Manager (GraphQL)
PostgreSQL (primary)
PostgreSQL (simulation)
Redis

Data Flow

AWS S3 (static files landing zone)
        ↓ (fetch configured datasets)
Data Loader [:2000]
  ├── Parse config.yaml
  ├── Resolve dependencies between load functions
  ├── Fetch data from S3
  └── Load into PostgreSQL tables
        ↓
PostgreSQL (Core Reference Schema)
        ↓
Experiment Manager → All downstream services

Data Sources

Data Loader manages the following reference datasets:

Dataset	Description	Format	Used By
OSM Map	OpenStreetMap ways and nodes	`.osm`	Situational Awareness, Schedule Generation
Segments	Static segment polygons and metadata	`.csv`	Situational Awareness
Environment Objects	CCTV, LUMS, SCATS intersections, sensors	`.csv`	Situational Awareness
Users & Roles	Login credentials and personalisation data	`.csv`	Core platform
Baselines	Segment and way metric baselines	`.csv`	Situational Awareness, Schedule Generation
Paths	All possible paths within the area of interest	`.csv`	Traffic Model
Schedule Environment	Segment lane data, intersection mappings	`.csv`	Traffic Model
SCATS Detector Counts	Historical detector count values	`.csv`	Schedule Generation

Configuration

The config.yaml specifies fetch and load behaviour for each data source:

yaml

init-functions:
  data-archive:
    s3-bucket-name: or-tfnsw-poc-data
    s3-archive-folder: archive
    postgres-table-names:
      - envobj_actual
      - event_actual
      - segment_actual
      - way_actual
      - event
    depends-on:
      - loader-functions

loader-functions:
  segment:
    load-method: s3
    download-parameters:
      segment:
        s3-bucket-name: or-dot-poc-data
        s3-file-key: data_loader/segment.csv
  osm:
    load-method: s3
    download-parameters:
      osm:
        s3-bucket-name: or-dot-poc-data
        s3-file-key: data_loader/knox.osm
    depends-on:
      - segment

Configuration Options

Option	Description	Required
`load-method`	How to fetch the data (e.g. `s3`, `lightosm-placename`, `lightosm-radius`)	Yes
`download-parameters`	S3 bucket name and file key, or other fetch-method-specific params	Yes (if S3)
`overwrite-local-files`	Skip download if a local copy exists	No
`depends-on`	List of loader functions that must complete first	No

Local Development

Running in the REPL

julia

using DataLoader
DataLoader.initialise_experiment_manager()  # Connect to GraphQL
DataLoader.App.__init__()                  # Load config.yaml
DataLoader.App.execute_config_loader_functions()  # Run all configured loaders

Loading a Specific Function

julia

# Load only environment objects (still requires config.yaml definition)
DataLoader.App.execute_config_loader_functions(["envobj"])

Loading OSM Data

julia

using DataLoader, LightOSM
g = graph_from_file("data/melb_cbd_1km.osm")
DataLoader.App.load_osm_network(g)

Operational Notes

Environment Object Reloading

When environment object IDs change, the database tables must be updated to remove old versions. Because device data is cross-referenced across the envobj, device static, and device actual tables, deletions must happen in order — first from static and actual tables, then from the envobj table.

Base Map Updates

Map files are loaded in Dev and promoted through higher environments. Mismatched files between environments can cause issues. When loading a new map, the internal map table must also be updated so that the broader application picks up the change.

Data Transformer — Uses loaded reference data for spatial tagging and transformation
Data Fusion — Depends on hex and segment data loaded by Data Loader
Data Archiver — Archives data from tables that Data Loader initially populates
Experiment Manager — Central coordination service (GraphQL on :5100)

Creating a Data Schema

Building and Configuring Workflows

DDK (Data)

MDK (Modelling)

Modelling Library

FDK (Frontend)

Nexus (Deployment)

Data Loader

Overview

Architecture

Key Components

Dependencies

Data Flow

Data Sources

Configuration

Configuration Options

Local Development

Running in the REPL

Loading a Specific Function

Loading OSM Data

Operational Notes

Environment Object Reloading

Base Map Updates

Modelling Library

Data Loader ​

Overview ​

Architecture ​

Key Components ​

Dependencies ​

Data Flow ​

Data Sources ​

Configuration ​

Configuration Options ​

Local Development ​

Running in the REPL ​

Loading a Specific Function ​

Loading OSM Data ​

Operational Notes ​

Environment Object Reloading ​

Base Map Updates ​

Related Services ​

Data Loader

Overview

Architecture

Key Components

Dependencies

Data Flow

Data Sources

Configuration

Configuration Options

Local Development

Running in the REPL

Loading a Specific Function

Loading OSM Data

Operational Notes

Environment Object Reloading

Base Map Updates

Related Services