Workflow Execution Manager

Overview

The Workflow Execution Manager (WEM) is the core runtime engine of the MDK. It orchestrates the execution of computational workflows, managing task scheduling, state transitions, variable substitution, and output coordination across heterogeneous model microservices.

Why the WEM Exists

Traditional workflow engines like Apache Airflow are designed for homogeneous compute environments — tasks that share a Python environment, access the same filesystem, and produce outputs in known formats. The MDK's models are deliberately heterogeneous: a Julia simulation doesn't share an environment with a Python demand forecaster or a Go route optimizer.

The WEM was built to orchestrate isolated microservices that communicate over HTTP. It understands the OR model contract (Input, Params, Output for every model) and can automatically route data between tasks written in different languages without requiring custom integration code.

Core Responsibilities

The WEM handles:

Task Scheduling — Determining execution order based on dependencies
State Management — Tracking task status (PENDING, IN_PROGRESS, SUCCESSFUL, FAILED)
Variable Substitution — Applying parameter values to task configurations
Data Routing — Coordinating output-to-input mappings through the Data Abstraction Layer
Execution Coordination — Invoking models, operators, and data connections
Loop Handling — Supporting iterative workflows for convergence problems
Result Capture — Storing outputs, logs, and metadata for comparison

Execution Lifecycle

1. Workflow Initialization

When a workflow or experiment is triggered:

The WEM loads the complete workflow definition including all tasks and dependencies
If executing an experiment, it creates or loads a trial and snapshots the configuration
Variables are merged from multiple sources (execution-time values, workflow defaults)
All tasks are initialized to PENDING state
A unique execution ID is generated

2. Task Scheduling

The WEM builds a dependency graph and identifies execution-ready tasks:

Dependency Analysis — Creates an adjacency list from task dependencies
Entry Point Identification — Finds tasks with no upstream dependencies
Priority-Based Queuing — Orders ready tasks by priority (higher priority first)
Execution Loop Begins — Processes tasks as they become ready

Priority-Based Scheduling ensures that when multiple tasks are ready simultaneously, critical path tasks execute first. Priority also serves as the cycle-breaking mechanism for loops — in cyclic dependencies, the higher-priority task becomes the loop entry point.

3. Task Execution

For each ready task, the WEM follows a consistent execution pattern:

Configuration Resolution

Merge configuration values from multiple layers:
- Base model configuration (defaults)
- Workflow task overrides
- Experiment-specific overrides
- Trial snapshots (for experiments)
Apply variable substitution using merged variables

Execution Path Selection

The WEM determines how to execute the task based on its component type:

Component Type	Execution Method
MODEL	HTTP POST to containerized model endpoint
OPERATOR	Execute custom Python function
IMPORT_DDK	Query external data APIs, then execute import function
EXPORT_DDK	Execute export function, then call external APIs

Task Invocation

Task state transitions to IN_PROGRESS
The appropriate execution handler is invoked
Output data, metadata, and logs are captured
Task state transitions to SUCCESSFUL or FAILED
For experiments, results are persisted to trial storage

4. Dependency Resolution

After each task completes, the WEM identifies downstream tasks that can now execute:

Output is stored to the Data Abstraction Layer (file system, cache, or database)
Dependent tasks are identified from the dependency graph
Readiness is checked — all upstream dependencies must be SUCCESSFUL
Ready tasks are queued for execution in priority order
Process repeats until all tasks complete or a failure blocks progress

5. Completion and Result Storage

When all tasks have completed:

Final state is determined (fully successful vs. partial failures)
Experiment results are stored with trial outputs for comparison
Execution metadata is recorded (timestamps, duration, resource usage)
User is notified of completion status

Key Concepts

Variable Substitution

Workflows can define variables that are substituted at execution time. This enables the same workflow to run with different parameter values across experiments.

Variable Sources (priority order):

Execution-time variables (passed as arguments)
Workflow default variables

Example: A workflow might define ${start_date} and ${end_date} variables. Each experiment sets specific values, and the WEM substitutes them into task configurations before execution.

Loop Handling

The WEM supports both Directed Acyclic Graphs (DAGs) and Directed Cyclic Graphs (DCGs). Cyclic workflows are useful for iterative algorithms that repeat until convergence.

Loop Execution:

Tasks in a loop execute repeatedly
Priority determines the loop entry point
Convergence logic is implemented in the tasks themselves
The WEM coordinates data flow through each iteration

Caching

The WEM integrates with the caching layer to avoid redundant computation. When useCache is enabled on a task:

Before execution, the WEM checks if identical inputs have been processed before
If a cached result exists and is valid, the task is skipped
If no cache exists, the task executes and the result is cached
This dramatically speeds up experimentation with partial workflow changes

Trial Snapshotting

For experiments, the WEM creates immutable trial snapshots:

Every task configuration is snapshotted at execution time
This ensures experiments are reproducible even if the workflow changes later
A trial run three months ago can be re-executed with identical configuration

Execution Modes

Test Execution

Run a workflow directly without creating an experiment. Useful for:

Testing workflow logic during development
Quick validation of model outputs
Debugging task configurations

Experiment Execution

Run a workflow as part of a structured experiment. This mode:

Creates or loads a trial
Snapshots the configuration
Stores results for systematic comparison
Enables parameter variation across multiple runs

Scheduled Execution

Workflows can be scheduled to run automatically:

Cron schedules — Periodic execution (e.g., daily, weekly)
One-off schedules — Single execution at a specific time
Event-triggered — Execution when external data changes
Startup execution — Run when the system initializes

Failure Handling

The WEM implements robust failure handling:

Task-Level Failures

Failed tasks transition to FAILED state
Logs and error messages are captured
Downstream tasks that depend on the failed task are blocked
Independent task branches continue execution

Retry Logic

Tasks can be configured with retry policies:

Maximum retry attempts
Retry delays
Exponential backoff

Partial Failures

When some tasks succeed and others fail:

Successful results are still stored
Users can review partial outputs
Failed tasks can be individually re-run after fixing issues

Performance Considerations

Parallel Execution

The WEM executes independent tasks in parallel when possible:

Tasks with no dependencies between them run concurrently
Resource limits are respected
Execution is distributed across available compute

Data Transfer Optimization

The WEM selects the most efficient data transfer mode based on:

Data size — Large datasets use file-based transfer, small data uses cache
Downstream needs — Multiple consumers prefer shared file storage
Performance requirements — Real-time needs favor cache over file I/O

Resource Management

Model containers are invoked on demand
Idle resources can be scaled down
Critical path tasks are prioritized to minimize overall workflow duration

Creating a Data Schema

Building and Configuring Workflows

DDK (Data)

MDK (Modelling)

Modelling Library

FDK (Frontend)

Nexus (Deployment)

Workflow Execution Manager

Overview

Why the WEM Exists

Core Responsibilities

Execution Lifecycle

1. Workflow Initialization

2. Task Scheduling

3. Task Execution

Configuration Resolution

Execution Path Selection

Task Invocation

4. Dependency Resolution

5. Completion and Result Storage

Key Concepts

Variable Substitution

Loop Handling

Caching

Trial Snapshotting

Execution Modes

Test Execution

Experiment Execution

Scheduled Execution

Failure Handling

Task-Level Failures

Retry Logic

Partial Failures

Performance Considerations

Parallel Execution

Data Transfer Optimization

Resource Management

See Also

Modelling Library

Workflow Execution Manager ​

Overview ​

Why the WEM Exists ​

Core Responsibilities ​

Execution Lifecycle ​

1. Workflow Initialization ​

2. Task Scheduling ​

3. Task Execution ​

Configuration Resolution ​

Execution Path Selection ​

Task Invocation ​

4. Dependency Resolution ​

5. Completion and Result Storage ​

Key Concepts ​

Variable Substitution ​

Loop Handling ​

Caching ​

Trial Snapshotting ​

Execution Modes ​

Test Execution ​

Experiment Execution ​

Scheduled Execution ​

Failure Handling ​

Task-Level Failures ​

Retry Logic ​

Partial Failures ​

Performance Considerations ​

Parallel Execution ​

Data Transfer Optimization ​

Resource Management ​

See Also ​

Workflow Execution Manager

Overview

Why the WEM Exists

Core Responsibilities

Execution Lifecycle

1. Workflow Initialization

2. Task Scheduling

3. Task Execution

Configuration Resolution

Execution Path Selection

Task Invocation

4. Dependency Resolution

5. Completion and Result Storage

Key Concepts

Variable Substitution

Loop Handling

Caching

Trial Snapshotting

Execution Modes

Test Execution

Experiment Execution

Scheduled Execution

Failure Handling

Task-Level Failures

Retry Logic

Partial Failures

Performance Considerations

Parallel Execution

Data Transfer Optimization

Resource Management

See Also