Appearance
Workflow Execution Manager
Overview
The Workflow Execution Manager (WEM) is the core runtime engine of the MDK. It orchestrates the execution of computational workflows, managing task scheduling, state transitions, variable substitution, and output coordination across heterogeneous model microservices.
Why the WEM Exists
Traditional workflow engines like Apache Airflow are designed for homogeneous compute environments — tasks that share a Python environment, access the same filesystem, and produce outputs in known formats. The MDK's models are deliberately heterogeneous: a Julia simulation doesn't share an environment with a Python demand forecaster or a Go route optimizer.
The WEM was built to orchestrate isolated microservices that communicate over HTTP. It understands the OR model contract (Input, Params, Output for every model) and can automatically route data between tasks written in different languages without requiring custom integration code.
Core Responsibilities
The WEM handles:
- Task Scheduling — Determining execution order based on dependencies
- State Management — Tracking task status (PENDING, IN_PROGRESS, SUCCESSFUL, FAILED)
- Variable Substitution — Applying parameter values to task configurations
- Data Routing — Coordinating output-to-input mappings through the Data Abstraction Layer
- Execution Coordination — Invoking models, operators, and data connections
- Loop Handling — Supporting iterative workflows for convergence problems
- Result Capture — Storing outputs, logs, and metadata for comparison
Execution Lifecycle
1. Workflow Initialization
When a workflow or experiment is triggered:
- The WEM loads the complete workflow definition including all tasks and dependencies
- If executing an experiment, it creates or loads a trial and snapshots the configuration
- Variables are merged from multiple sources (execution-time values, workflow defaults)
- All tasks are initialized to PENDING state
- A unique execution ID is generated
2. Task Scheduling
The WEM builds a dependency graph and identifies execution-ready tasks:
- Dependency Analysis — Creates an adjacency list from task dependencies
- Entry Point Identification — Finds tasks with no upstream dependencies
- Priority-Based Queuing — Orders ready tasks by priority (higher priority first)
- Execution Loop Begins — Processes tasks as they become ready
Priority-Based Scheduling ensures that when multiple tasks are ready simultaneously, critical path tasks execute first. Priority also serves as the cycle-breaking mechanism for loops — in cyclic dependencies, the higher-priority task becomes the loop entry point.
3. Task Execution
For each ready task, the WEM follows a consistent execution pattern:
Configuration Resolution
- Merge configuration values from multiple layers:
- Base model configuration (defaults)
- Workflow task overrides
- Experiment-specific overrides
- Trial snapshots (for experiments)
- Apply variable substitution using merged variables
Execution Path Selection
The WEM determines how to execute the task based on its component type:
| Component Type | Execution Method |
|---|---|
| MODEL | HTTP POST to containerized model endpoint |
| OPERATOR | Execute custom Python function |
| IMPORT_DDK | Query external data APIs, then execute import function |
| EXPORT_DDK | Execute export function, then call external APIs |
Task Invocation
- Task state transitions to IN_PROGRESS
- The appropriate execution handler is invoked
- Output data, metadata, and logs are captured
- Task state transitions to SUCCESSFUL or FAILED
- For experiments, results are persisted to trial storage
4. Dependency Resolution
After each task completes, the WEM identifies downstream tasks that can now execute:
- Output is stored to the Data Abstraction Layer (file system, cache, or database)
- Dependent tasks are identified from the dependency graph
- Readiness is checked — all upstream dependencies must be SUCCESSFUL
- Ready tasks are queued for execution in priority order
- Process repeats until all tasks complete or a failure blocks progress
5. Completion and Result Storage
When all tasks have completed:
- Final state is determined (fully successful vs. partial failures)
- Experiment results are stored with trial outputs for comparison
- Execution metadata is recorded (timestamps, duration, resource usage)
- User is notified of completion status
Key Concepts
Variable Substitution
Workflows can define variables that are substituted at execution time. This enables the same workflow to run with different parameter values across experiments.
Variable Sources (priority order):
- Execution-time variables (passed as arguments)
- Workflow default variables
Example: A workflow might define ${start_date} and ${end_date} variables. Each experiment sets specific values, and the WEM substitutes them into task configurations before execution.
Loop Handling
The WEM supports both Directed Acyclic Graphs (DAGs) and Directed Cyclic Graphs (DCGs). Cyclic workflows are useful for iterative algorithms that repeat until convergence.
Loop Execution:
- Tasks in a loop execute repeatedly
- Priority determines the loop entry point
- Convergence logic is implemented in the tasks themselves
- The WEM coordinates data flow through each iteration
Caching
The WEM integrates with the caching layer to avoid redundant computation. When useCache is enabled on a task:
- Before execution, the WEM checks if identical inputs have been processed before
- If a cached result exists and is valid, the task is skipped
- If no cache exists, the task executes and the result is cached
- This dramatically speeds up experimentation with partial workflow changes
Trial Snapshotting
For experiments, the WEM creates immutable trial snapshots:
- Every task configuration is snapshotted at execution time
- This ensures experiments are reproducible even if the workflow changes later
- A trial run three months ago can be re-executed with identical configuration
Execution Modes
Test Execution
Run a workflow directly without creating an experiment. Useful for:
- Testing workflow logic during development
- Quick validation of model outputs
- Debugging task configurations
Experiment Execution
Run a workflow as part of a structured experiment. This mode:
- Creates or loads a trial
- Snapshots the configuration
- Stores results for systematic comparison
- Enables parameter variation across multiple runs
Scheduled Execution
Workflows can be scheduled to run automatically:
- Cron schedules — Periodic execution (e.g., daily, weekly)
- One-off schedules — Single execution at a specific time
- Event-triggered — Execution when external data changes
- Startup execution — Run when the system initializes
Failure Handling
The WEM implements robust failure handling:
Task-Level Failures
- Failed tasks transition to FAILED state
- Logs and error messages are captured
- Downstream tasks that depend on the failed task are blocked
- Independent task branches continue execution
Retry Logic
Tasks can be configured with retry policies:
- Maximum retry attempts
- Retry delays
- Exponential backoff
Partial Failures
When some tasks succeed and others fail:
- Successful results are still stored
- Users can review partial outputs
- Failed tasks can be individually re-run after fixing issues
Performance Considerations
Parallel Execution
The WEM executes independent tasks in parallel when possible:
- Tasks with no dependencies between them run concurrently
- Resource limits are respected
- Execution is distributed across available compute
Data Transfer Optimization
The WEM selects the most efficient data transfer mode based on:
- Data size — Large datasets use file-based transfer, small data uses cache
- Downstream needs — Multiple consumers prefer shared file storage
- Performance requirements — Real-time needs favor cache over file I/O
Resource Management
- Model containers are invoked on demand
- Idle resources can be scaled down
- Critical path tasks are prioritized to minimize overall workflow duration
See Also
- Data Flow & Transformations — How the DAL coordinates data movement that the WEM orchestrates
- Architecture — How the WEM fits into the broader MDK architecture
- Component Types — The four task execution modes the WEM supports
- Data Schema — The Study → Experiment → Trial hierarchy the WEM operates within
