Skip to content

Workflow Execution Manager

Platform Users — Engineers & Low-code Ops Users (ORA / Panel Builder) OR Platform ORA — AI Planning Interface Agent Workflows Plan Visualisation ADK Integration SDK UI — Frontend Shell FDK Architecture Low code Config-driven DDK Schema Definition Code Generator Generated Server MDK WEM DAL Experiment Manager Nexus Deployment Control Live Monitoring Registry Browser SCDK Source Control Pipeline Mgmt Azure DevOps deploys ↓ SDK API — GraphQL Federation Gateway Federation Gateway Component Resolvers Auth & Licensing Plugins: gql-autogeneration Migrator Helm KinD Boilerplate GenAI ··· Microservices — Domain IP Services Data Pipeline Core Platform Metrics & Analytics Spatial & Geo Simulation Event Detection Camera & Device Fire & Resource Opt. Satellite Modelling ↓ Nexus deploys Deployed OR Applications Rail Ops Dashboard Mine Mgmt Dashboard Port Ops Dashboard ··· FDK-built · DDK-backed · MDK-powered · deployed via Nexus ↑ Application Users — Operations Teams (shift managers, analysts, planners)

Overview

The Workflow Execution Manager (WEM) is the core runtime engine of the MDK. It orchestrates the execution of computational workflows, managing task scheduling, state transitions, variable substitution, and output coordination across heterogeneous model microservices.

Why the WEM Exists

Traditional workflow engines like Apache Airflow are designed for homogeneous compute environments — tasks that share a Python environment, access the same filesystem, and produce outputs in known formats. The MDK's models are deliberately heterogeneous: a Julia simulation doesn't share an environment with a Python demand forecaster or a Go route optimizer.

The WEM was built to orchestrate isolated microservices that communicate over HTTP. It understands the OR model contract (Input, Params, Output for every model) and can automatically route data between tasks written in different languages without requiring custom integration code.

Core Responsibilities

The WEM handles:

  • Task Scheduling — Determining execution order based on dependencies
  • State Management — Tracking task status (PENDING, IN_PROGRESS, SUCCESSFUL, FAILED)
  • Variable Substitution — Applying parameter values to task configurations
  • Data Routing — Coordinating output-to-input mappings through the Data Abstraction Layer
  • Execution Coordination — Invoking models, operators, and data connections
  • Loop Handling — Supporting iterative workflows for convergence problems
  • Result Capture — Storing outputs, logs, and metadata for comparison

Execution Lifecycle

1. Workflow Initialization

When a workflow or experiment is triggered:

  1. The WEM loads the complete workflow definition including all tasks and dependencies
  2. If executing an experiment, it creates or loads a trial and snapshots the configuration
  3. Variables are merged from multiple sources (execution-time values, workflow defaults)
  4. All tasks are initialized to PENDING state
  5. A unique execution ID is generated

2. Task Scheduling

The WEM builds a dependency graph and identifies execution-ready tasks:

  1. Dependency Analysis — Creates an adjacency list from task dependencies
  2. Entry Point Identification — Finds tasks with no upstream dependencies
  3. Priority-Based Queuing — Orders ready tasks by priority (higher priority first)
  4. Execution Loop Begins — Processes tasks as they become ready

Priority-Based Scheduling ensures that when multiple tasks are ready simultaneously, critical path tasks execute first. Priority also serves as the cycle-breaking mechanism for loops — in cyclic dependencies, the higher-priority task becomes the loop entry point.

3. Task Execution

For each ready task, the WEM follows a consistent execution pattern:

Configuration Resolution

  1. Merge configuration values from multiple layers:
    • Base model configuration (defaults)
    • Workflow task overrides
    • Experiment-specific overrides
    • Trial snapshots (for experiments)
  2. Apply variable substitution using merged variables

Execution Path Selection

The WEM determines how to execute the task based on its component type:

Component TypeExecution Method
MODELHTTP POST to containerized model endpoint
OPERATORExecute custom Python function
IMPORT_DDKQuery external data APIs, then execute import function
EXPORT_DDKExecute export function, then call external APIs

Task Invocation

  1. Task state transitions to IN_PROGRESS
  2. The appropriate execution handler is invoked
  3. Output data, metadata, and logs are captured
  4. Task state transitions to SUCCESSFUL or FAILED
  5. For experiments, results are persisted to trial storage

4. Dependency Resolution

After each task completes, the WEM identifies downstream tasks that can now execute:

  1. Output is stored to the Data Abstraction Layer (file system, cache, or database)
  2. Dependent tasks are identified from the dependency graph
  3. Readiness is checked — all upstream dependencies must be SUCCESSFUL
  4. Ready tasks are queued for execution in priority order
  5. Process repeats until all tasks complete or a failure blocks progress

5. Completion and Result Storage

When all tasks have completed:

  1. Final state is determined (fully successful vs. partial failures)
  2. Experiment results are stored with trial outputs for comparison
  3. Execution metadata is recorded (timestamps, duration, resource usage)
  4. User is notified of completion status

Key Concepts

Variable Substitution

Workflows can define variables that are substituted at execution time. This enables the same workflow to run with different parameter values across experiments.

Variable Sources (priority order):

  1. Execution-time variables (passed as arguments)
  2. Workflow default variables

Example: A workflow might define ${start_date} and ${end_date} variables. Each experiment sets specific values, and the WEM substitutes them into task configurations before execution.

Loop Handling

The WEM supports both Directed Acyclic Graphs (DAGs) and Directed Cyclic Graphs (DCGs). Cyclic workflows are useful for iterative algorithms that repeat until convergence.

Loop Execution:

  • Tasks in a loop execute repeatedly
  • Priority determines the loop entry point
  • Convergence logic is implemented in the tasks themselves
  • The WEM coordinates data flow through each iteration

Caching

The WEM integrates with the caching layer to avoid redundant computation. When useCache is enabled on a task:

  1. Before execution, the WEM checks if identical inputs have been processed before
  2. If a cached result exists and is valid, the task is skipped
  3. If no cache exists, the task executes and the result is cached
  4. This dramatically speeds up experimentation with partial workflow changes

Trial Snapshotting

For experiments, the WEM creates immutable trial snapshots:

  • Every task configuration is snapshotted at execution time
  • This ensures experiments are reproducible even if the workflow changes later
  • A trial run three months ago can be re-executed with identical configuration

Execution Modes

Test Execution

Run a workflow directly without creating an experiment. Useful for:

  • Testing workflow logic during development
  • Quick validation of model outputs
  • Debugging task configurations

Experiment Execution

Run a workflow as part of a structured experiment. This mode:

  • Creates or loads a trial
  • Snapshots the configuration
  • Stores results for systematic comparison
  • Enables parameter variation across multiple runs

Scheduled Execution

Workflows can be scheduled to run automatically:

  • Cron schedules — Periodic execution (e.g., daily, weekly)
  • One-off schedules — Single execution at a specific time
  • Event-triggered — Execution when external data changes
  • Startup execution — Run when the system initializes

Failure Handling

The WEM implements robust failure handling:

Task-Level Failures

  • Failed tasks transition to FAILED state
  • Logs and error messages are captured
  • Downstream tasks that depend on the failed task are blocked
  • Independent task branches continue execution

Retry Logic

Tasks can be configured with retry policies:

  • Maximum retry attempts
  • Retry delays
  • Exponential backoff

Partial Failures

When some tasks succeed and others fail:

  • Successful results are still stored
  • Users can review partial outputs
  • Failed tasks can be individually re-run after fixing issues

Performance Considerations

Parallel Execution

The WEM executes independent tasks in parallel when possible:

  • Tasks with no dependencies between them run concurrently
  • Resource limits are respected
  • Execution is distributed across available compute

Data Transfer Optimization

The WEM selects the most efficient data transfer mode based on:

  • Data size — Large datasets use file-based transfer, small data uses cache
  • Downstream needs — Multiple consumers prefer shared file storage
  • Performance requirements — Real-time needs favor cache over file I/O

Resource Management

  • Model containers are invoked on demand
  • Idle resources can be scaled down
  • Critical path tasks are prioritized to minimize overall workflow duration

See Also

User documentation for Optimal Reality