Skip to content

Modelling Development Kit (MDK)

Platform Users — Engineers & Low-code Ops Users (ORA / Panel Builder) OR Platform ORA — AI Planning Interface Agent Workflows Plan Visualisation ADK Integration SDK UI — Frontend Shell FDK Architecture Low code Config-driven DDK Schema Definition Code Generator Generated Server MDK WEM DAL Experiment Manager Nexus Deployment Control Live Monitoring Registry Browser SCDK Source Control Pipeline Mgmt Azure DevOps deploys ↓ SDK API — GraphQL Federation Gateway Federation Gateway Component Resolvers Auth & Licensing Plugins: gql-autogeneration Migrator Helm KinD Boilerplate GenAI ··· Microservices — Domain IP Services Data Pipeline Core Platform Metrics & Analytics Spatial & Geo Simulation Event Detection Camera & Device Fire & Resource Opt. Satellite Modelling ↓ Nexus deploys Deployed OR Applications Rail Ops Dashboard Mine Mgmt Dashboard Port Ops Dashboard ··· FDK-built · DDK-backed · MDK-powered · deployed via Nexus ↑ Application Users — Operations Teams (shift managers, analysts, planners)

The Modelling Development Kit (MDK) is the modelling and experimentation layer of the OR platform. It provides a unified environment for building custom models, training, orchestrating multi-model workflows, running systematic experiments, and tuning operational decision pipelines — across AI/LLM models, simulation engines, data transformers, and external data systems.

Who is this for?

  • ML engineers and data scientists composing multi-model computational pipelines — chain specialist models built in Python, Julia, or Go into a single end-to-end workflow
  • Research and analytics teams who need systematic experiment tracking: run a workflow with different parameter values, compare results across trials, and build confidence through repeatable analysis
  • Domain teams building decision-support tools that require model orchestration and a structured audit trail of what ran, with what configuration, and what it produced
  • Platform engineers integrating the MDK into a broader OR application — the MDK's backend is itself a DDK-generated server, so DDK familiarity helps

Platform Context

Why This System Exists

Real-world physical AI problems require combining multiple specialist models — a schedule generator, a traffic forecaster, a route optimiser, a capacity simulator — into a single end-to-end decision pipeline. These models are typically built independently, by different teams, in different languages.

The challenge is not always building the models — for many standard analytical tasks, the models exist. The challenge is connecting them: routing outputs from one model as inputs to the next, managing execution order, handling failures, and systematically testing how different parameter choices change outcomes. Without a shared execution environment, teams spend as much time on integration plumbing as on the models themselves. That said, some models are genuinely hard to build and represent OR IP — the Traffic Model's network System-of-Systems dynamics engine, high-efficiency linear systems solvers, and AI agent workloads are not commodity off-the-shelf components. The Modelling Library lists the specialist models available out of the box.

The MDK solves this. It provides a platform-level workflow engine where independent model services can be composed into pipelines and tested systematically — without requiring the teams who built them to agree on integration patterns.

The obvious alternative is to reach for an existing orchestration tool — and we tried that. The MDK originally used Apache Airflow as its execution backend. Airflow is a proven system for data pipeline orchestration, and its DAG model maps naturally to the concept of a workflow. But Airflow was designed for homogeneous compute: tasks that share a Python environment, access the same filesystem, and produce outputs that other Airflow tasks know how to read. OR's models are deliberately isolated microservices — a Julia simulation does not share an environment with a Python demand forecaster. Airflow could schedule the containers but could not understand their inputs and outputs, which meant every integration still fell back to bespoke code. We deprecated the Airflow wrapper and built the Workflow Execution Manager (WEM) in its place: an orchestrator that understands the OR model contract natively. The infrastructure for generating Airflow DAG definitions still exists in or-plugin-airflow, repurposed for code generation templates; the execution runtime does not. This was not a small decision, and it is worth being explicit about: MLflow, Prefect, and Kubeflow solve real problems, but they solve them for teams whose models share a compute environment. They are not built for heterogeneous microservice orchestration with a standardized but language-agnostic model API.

A second design decision worth naming upfront: the MDK is structured to keep operators in the loop before models are deployed to production — but the goal is autonomy. The Studies, Experiments, and Trials structure, the trial promotion mechanism, and the systematic comparison capability all exist to build confidence before outputs drive operational decisions. Engineers and analysts working on regulated physical systems need to understand and validate what their models are doing at the point of deployment. Once that confidence is established, the platform is designed to support progressively autonomous execution over time.

What this enables in practice:

  • A rail operator evaluating disruption response strategies can chain a schedule generation model → passenger demand model → capacity simulation → delay impact calculator. The MDK handles data routing and execution order; the team focuses on interpreting results. See or-demo-transport-rail for a working implementation of this pattern.

  • A mining operation testing different extraction schedules against route optimisation and cost models can define this as a single workflow, vary extraction parameters across experiments, and compare outcomes — without building a custom experiment runner.

  • A logistics team combining real-time traffic data with a route optimiser and schedule generator can run systematic experiments varying time windows, capacity constraints, or routing priorities to find the best operational configuration.

  • An autonomous AI agent monitoring an asset network identifies a high-priority fault alert, runs a predictive maintenance model to assess likely impact and timing, and automatically schedules a maintenance crew — surfacing the recommendation to the operator for confirmation before execution.

In each case, the value is the same: specialist models that would otherwise be siloed become a coordinated system. Teams move from model development to comparative analysis faster, and decisions are grounded in systematic experimentation rather than one-off runs.


Core Concepts and Principles

Specialist models stay independent — but work together

Each model in the MDK runs as its own isolated service with a standard API interface. Because the MDK only needs to understand a model's inputs and outputs — not its internal workings — domain experts can build and maintain models in whichever language and framework they know best: Julia, Python, Go, or any service exposing a standard HTTP endpoint. This means a new or improved model can be added to an existing workflow, or swapped out for a better version, without touching anything else in the pipeline. Teams remain autonomous while their models become interoperable.

The critical enabler is the OR HTTP contract: a standardized three-type pattern (Input, Params, Output) that every model implements. This is not a schema format or a serialization library — it is a communication contract that any language can fulfill over plain HTTP. We deliberately chose HTTP over gRPC for model invocation. gRPC would offer better throughput and streaming, but it requires shared proto definitions, which couples every model to OR's toolchain. A Julia modeller should not need to manage protobuf; they should implement a POST endpoint. The contract is simple enough to implement in an afternoon in any language, and it is strict enough for the WEM to reason about data routing without reading model internals. There are four component types that define how the MDK interacts with each task — from issuing HTTP requests to containerised MODEL services, to executing custom Python OPERATOR functions, to pulling or pushing structured data via IMPORT_DDK and EXPORT_DDK connections to DDK-generated data servers. Before a model can be invoked in a workflow, its container must be deployed into the cluster; Nexus manages that deployment lifecycle, and the Architecture page covers how model services are discovered and registered once they are running.

Workflows are visual and version-controlled

Workflows are defined as directed graphs — a visual diagram of steps with arrows showing which outputs feed into which inputs. This makes complex multi-step analytical pipelines understandable and auditable without reading code. Teams can see exactly what runs and in what order, and when a workflow changes, the platform versions it automatically so previous configurations are preserved and recoverable. The Workflow Execution Manager handles everything that happens when a workflow runs: constructing the task dependency graph, scheduling tasks in priority order, substituting variables, and tracking execution state from PENDING through to SUCCESSFUL or FAILED. The underlying Data Schema defines the Workflow, WorkflowTask, and WorkflowTaskDependency entities the visual diagram is built on — including how configuration is layered across model defaults, task overrides, and experiment-specific values.

Data moves automatically between models, regardless of how they were built

The MDK uses a Data Abstraction Layer (DAL) that handles all data transfer between models behind the scenes. A Go model's output is automatically made available as a Python model's input — no manual conversion, no custom integration code. This is what makes it possible to combine models built by different teams with different technologies in a single workflow: the platform absorbs the integration cost that would otherwise fall on each team. The Data Flow & Transformations page covers how this works in practice: the three DAL transfer modes (shared file system for large datasets, Redis for smaller state, and direct HTTP), how the WEM matches output field names from one task to input field names on the next, and how result caching with useCache lets the platform skip re-running expensive tasks when inputs haven't changed. IMPORT_DDK and EXPORT_DDK component types extend this further by connecting workflows directly to DDK-generated data servers, so structured domain data can flow into a workflow as an input or be written back out as a result.

Workflow-level tuning, not just single model tuning

The MDK is built around the concept of experiments: you define a workflow, then run it with different parameter values to see how the end-to-end pipeline outcome changes. This is workflow tuning — varying configuration across the entire chain of models, operators, and data connections — not just single-model parameter sweeps. Each run is stored as a distinct trial, and results are compared side by side. This mirrors how analysts and engineers actually work — not accepting the first answer, but testing assumptions, varying inputs, and building confidence through repeated analysis. The structure of Studies, Experiments, and Trials makes this systematic rather than ad hoc.

This structure is more opinionated than most experiment tracking tools. MLflow and Weights & Biases are excellent at recording what happened when a model ran; they are less opinionated about how you structure the comparison. The MDK's Studies → Experiments → Trials hierarchy enforces a specific decision-making process: you articulate a research objective (Study), define a specific parameter configuration to test (Experiment), and execute it one or more times (Trials). The hierarchy exists because teams working on regulated physical systems need an audit trail that shows not just what the model produced, but what question was being asked and what configuration was used to answer it. Every TrialTaskValue is snapshotted and made immutable at execution time, so a trial run three months ago can be reproduced exactly even if the workflow has changed since. That level of traceability is a deliberate design choice, and it comes at the cost of the flexibility that comes with freeform experiment tracking. The Workflow Execution Manager documents the full execution lifecycle for experiments: how trials are created and their configurations snapshotted at the point of execution, how variable substitution allows the same workflow to run with different parameter values across trials, and how iterative loop-based workflows are supported for convergence problems. The Data Schema covers the Study → Experiment → Trial entity hierarchy and how per-task outputs — including logs, visualisation metadata, and structured results — are stored for post-run comparison.


Role in the Optimal Reality Platform

The MDK occupies the workflow orchestration layer of the Optimal Reality platform. It sits between the API gateway and the model execution runtime, coordinating the movement of data through computational pipelines.

The MDK is responsible for:

  • Workflow definition and versioning — storing directed acyclic and cyclic graphs (DAG/DCG) of computational tasks
  • Experiment orchestration — binding parameter values to workflows and managing their execution via the Workflow Execution Manager (WEM)
  • Model registry — discovering and registering model microservices deployed to the cluster
  • Data coordination — routing task outputs to subsequent task inputs via the Data Abstraction Layer (DAL)

The MDK interacts with:

  • DDK — application teams use DDK servers as the structured data layer for OR apps; the MDK connects to these via IMPORT_DDK and EXPORT_DDK to read domain data into workflows and write results back out.
  • Nexus — model containers must be deployed to the Kubernetes cluster before the WEM can invoke them; Nexus manages this deployment lifecycle.
  • DDK data servers — via IMPORT_DDK and EXPORT_DDK task types, workflows can read from and write to structured domain data systems.

How It Fits in the End-to-End Platform

Domain UIs and Applications (built with FDK)

MDK Workflow Builder / Experiment Runner (in OR SDK UI)

or-sdk-api (GraphQL API Gateway)

or-app-experiment-manager (MDK — Workflow Orchestration & WEM)

┌────────────────────────────────────────────────────────┐
│  Workflow Execution Manager (WEM)                      │
│  ├── Model Collection microservices (Python/Julia/Go)  │
│  ├── PyRunner — custom Python operator execution       │
│  └── DDK data servers — import/export task types       │
└────────────────────────────────────────────────────────┘

PostgreSQL + Redis (data persistence and DAL transfer)

KinD / Kubernetes (container orchestration, managed via Nexus)

The MDK is the primary execution engine for computational work in OR. Models sourced from the product app store or built by users feed into it. Nexus manages the deployment lifecycle for model containers. Domain teams surface results to end users through FDK-built applications.


Overview

The MDK enables teams to:

  • Orchestrate Workflows — Chain data ingestion, models, operators, and outputs into end-to-end directed acyclic graph (DAG) and directed cyclic graph (DCG) workflows
  • Manage Models — Import pre-built model collections from Swagger definitions or create custom models via language-specific templates (Python, Julia, Go)
  • Run Experiments — Organise execution into Studies, Experiments, and Trials for systematic parameter variation and comparison, enabling workflow tuning
  • Connect Data Systems — Import from and export to DDK GraphQL servers via typed data connections
  • Transform Data — Apply custom operators for data manipulation between workflow tasks
  • Version Everything — Track workflow and model versions with rollback support
  • Deploy Workflows — Schedule (cron, one-off), trigger on DDK events, stream via Redis, or run on startup

Key Concepts

ConceptDescription
StudyA high-level container for related experiments sharing a common research objective
ExperimentA specific execution of a workflow with particular variable values and per-task overrides
WorkflowA versioned DAG/DCG of tasks defining the execution flow of computational components
WorkflowTaskAn individual execution unit within a workflow, configured with a model config, inputs, and parameters
ModelA microservice exposing POST endpoints with defined Input, Params, and Output types via Swagger/OpenAPI
ModelConfigThe configuration of a model, allowing users to set bespoke Input and Params values for each task execution
Model CollectionA microservice housing one or many models, deployed as a container in the KinD cluster
OperatorA custom function (Python) that transforms or processes data between tasks
Data ConnectionA component that imports from or exports to DDK GraphQL servers
FunctionStored source code executed by the PyRunner service for operators and data connections
Component TypeClassification of a task: MODEL, OPERATOR, IMPORT_DDK, or EXPORT_DDK
DALData Abstraction Layer — enables cross-language data communication via file, Redis, or raw transfer modes
WEMWorkflow Execution Manager — the in-house runtime engine that orchestrates workflow execution

Quick Start

1. Install the Experiment Manager

When a user first navigates to the MDK in their OR project, the system automatically deploys the required services to the KinD cluster:

  • mdk-experiment-manager — The core GQL API server
  • mdk-postgres — PostgreSQL database for workflow/model/experiment data
  • mdk-redis — Redis for caching and DAL data transfer
  • mdk-federation — Apollo Federation gateway for FE client access
  • or-ast-py-runner — Python executor for operators and data connections

2. Build Workflows

Use the Workflow Builder UI to create workflows by dragging components from the component library onto the canvas. Connect tasks with dependencies to define execution order and data flow.

3. Configure Tasks

Each workflow task is backed by a model configuration defining its inputs, parameters, and outputs. Configure task-specific values and map variables for runtime substitution.

4. Run Experiments

Create Studies and Experiments to systematically test different configurations. The WEM executes workflows as DAGs, managing task state, outputs, and logs.

5. Analyse Results

View experiment outputs and visualisations to compare results across different parameter variations.

Built With

  • Go — Server language for the Experiment Manager
  • gqlgen — GraphQL server generation (via DDK)
  • PostgreSQL — Primary data store (GORM ORM)
  • Redis — Caching layer and DAL data transfer
  • Gin — HTTP framework
  • KinD (Kubernetes in Docker) — Local cluster deployment
  • Apollo Federation — Subgraph federation for FE access
  • FastAPI / Pydantic — Python model framework
  • Oxygen.jl / SwaggerMarkdown.jl — Julia model framework
  • Docker / ECR — Container deployment for model collections

Repositories

RepositoryPurpose
or-app-experiment-managerCore MDK API — schema, resolvers, WEM
or-sdk-apiSDK resolvers for MDK setup, model fetching, and KinD installation
or-plugin-airflowDAG codegen templates (Airflow wrapper deprecated, codegen still used)
or-plugin-boilerplategRPC plugin for generating model boilerplate from templates — Plugin Docs
or-sdk-protobufProtobuf definitions for gRPC communication

Next Steps

Go deeper into the MDK:

Connect to the rest of the platform:

  • DDK — the MDK's own backend is a DDK-generated server; IMPORT_DDK and EXPORT_DDK task types connect to DDK data servers
  • Nexus — model containers must be deployed to the cluster before the WEM can invoke them

User documentation for Optimal Reality