Data Loader

Overview

The Data Loader service provides flexible data loading capabilities for databases, supporting multiple data formats and sources. It enables rapid data ingestion from CSV, JSON, and SQL sources, plus mock data generation for testing and development.

This service is designed for data engineers, analysts, and developers who need to move data into databases quickly and reliably for workflows, testing, or production operations.

Key Capabilities

Multi-Format Data Loading

CSV Import — Load data from comma-separated value files
JSON Import — Ingest JSON documents and arrays
SQL Execution — Run SQL scripts for complex loading logic
Bulk Loading — Efficiently load large datasets

Mock Data Generation

Synthetic Data — Generate realistic test data
Schema-Based — Create data matching table schemas
Configurable Volume — Generate from dozens to millions of rows
Realistic Values — Names, addresses, dates, numbers follow realistic patterns

Data Transformation

Type Conversion — Automatic conversion to database types
Field Mapping — Rename columns during load
Filtering — Load only rows matching criteria
Enrichment — Add computed columns or default values

Database Management

Table Creation — Create tables from data structure
Schema Updates — Modify existing table schemas
Index Management — Create indexes for query performance
Data Export — Extract data to CSV or JSON

Use Cases

Development & Testing

Scenario: Developers need realistic test data in a database to validate application functionality.

Workflow:

Define database schema
Generate mock data matching schema
Load into development database
Run application tests against realistic data

Value: Test with realistic data without accessing production systems or manual data creation.

Data Migration

Scenario: Migrate data from legacy systems to new database schema.

Workflow:

Export data from legacy system to CSV
Define field mappings (old → new schema)
Transform and validate data
Load into new database
Verify data integrity

Value: Streamline migration with automated transformation and validation.

Workflow Data Preparation

Scenario: A workflow needs specific data in a database to run models.

Workflow:

Prepare data in CSV or JSON format
Load data into workflow database
Run models that query the database
Export results for analysis

Value: Set up data dependencies automatically as part of workflow execution.

Prototype Development

Scenario: Rapidly prototype a new application feature that requires database interaction.

Workflow:

Generate mock data for testing
Load into local database
Develop and test feature
Iterate quickly without production data access

Value: Accelerate development with instant, realistic test data.

Model Inputs

The Data Loader service accepts:

Data Source — CSV file, JSON document, or SQL script
Database Connection — Connection details for target database
Table Name — Where to load the data
Field Mappings — How source fields map to database columns
Loading Options — Batch size, conflict handling, validation rules

Model Outputs

The service produces:

Load Status — Success/failure and row counts
Error Messages — Details of any failures
Data Validation Results — Quality checks and issues found
Exported Data — For export operations, the resulting CSV or JSON

Supported Operations

Data Import

Load CSV

Parse CSV files with headers
Handle quoted fields and escaping
Detect delimiters automatically
Type inference for columns

Load JSON

Import JSON arrays (one row per element)
Handle nested objects (flatten or store as JSON)
Batch processing for large files

Execute SQL

Run INSERT statements
Execute complex transformations
Handle dependencies between statements
Transaction support for atomicity

Data Export

Export to CSV

Query database tables
Apply filters and sorting
Custom column selection
Headers and formatting options

Export to JSON

Convert query results to JSON
Nested object support
Streaming for large datasets

Mock Data Generation

Generate Tables

Create data matching schema automatically
Realistic values for common types (names, emails, dates)
Configurable row counts
Foreign key relationship handling

Custom Generators

Define custom value generators
Specify distributions for numeric data
Pattern-based string generation

Configuration Options

Key parameters you can configure:

Database Connection — Host, port, database name, credentials
Table Name — Target table for loading
Data Format — CSV, JSON, SQL
Batch Size — Number of rows per transaction (performance tuning)
Conflict Resolution — Ignore duplicates, update, or fail
Validation Rules — Data quality checks before loading
Type Mappings — Explicit type conversions

Loading Strategies

Append Mode

Add new rows to existing table:

Preserves existing data
Fast for incremental loads
Risk of duplicates if not handled

Replace Mode

Delete existing data and load fresh:

Ensures clean state
Good for full refreshes
Temporary table unavailability

Upsert Mode

Update existing rows, insert new ones:

Handles changes efficiently
Requires unique key definition
More complex but flexible

Data Quality

Validation

Pre-load checks:

Type Validation — Ensure values match column types
Required Fields — Check for missing mandatory data
Constraints — Verify foreign keys, uniqueness, ranges
Format Checks — Validate emails, URLs, phone numbers

Error Handling

When validation fails:

Stop on Error — Abort load, no changes
Skip Invalid Rows — Load valid rows, log errors
Best Effort — Load what's possible, report issues

Integration with Other Models

The Data Loader service works well with:

Data Ingestion — Ingest external data, then load to database
Schedule Generation — Load requirements data for optimization
Traffic Model — Load network topology and demand data
Any Model — Prepare database inputs for model workflows

Performance Optimization

Bulk Loading

Use large batch sizes (1000-10000 rows)
Disable indexes during load, rebuild after
Use transactions for atomicity
Parallel loading for large datasets

Indexing Strategy

Drop indexes before large loads
Rebuild indexes after load completes
Create only necessary indexes
Monitor query patterns to optimize indexes

Resource Management

Limit concurrent connections
Monitor memory usage for large loads
Stream large files rather than loading into memory
Clean up temporary resources

Getting Started

Basic Workflow

Prepare Data — Format as CSV, JSON, or SQL
Configure Connection — Set database credentials
Define Mapping — Map source to target fields
Add to Workflow — Drag Data Loader into canvas
Load & Verify — Execute and check results

Example: CSV to Database

[Data Ingestion: Download CSV] → [Data Loader: Load CSV] → [Verify Data]

This workflow downloads a CSV file and loads it into a database table.

Example: Mock Data Setup

[Data Loader: Generate Mock] → [Run Test Workflow] → [Clean Up]

This workflow generates test data, runs a workflow against it, and cleans up afterward.

Best Practices

Data Preparation

Validate Before Loading — Check data quality in source
Clean Data — Remove or fix errors before load
Consistent Encoding — Use UTF-8 for text
Document Schema — Maintain clear schema documentation

Performance

Batch Loading — Use large batches for efficiency
Disable Constraints — Temporarily during bulk load
Parallel Loading — Split large datasets across connections
Monitor Progress — Log row counts and timing

Reliability

Use Transactions — Ensure all-or-nothing loads
Backup First — Protect existing data before replace operations
Validate After Load — Verify row counts and data integrity
Log Everything — Track what was loaded when

Security

Secure Credentials — Don't hardcode database passwords
Limit Permissions — Use accounts with minimal required access
Sanitize Inputs — Prevent SQL injection from data
Audit Loads — Track who loaded what when

Troubleshooting

Load Fails with Type Errors

Check source data types match target schema
Review type conversion rules
Look for invalid values (nulls in NOT NULL columns)
Verify date/time formats

Duplicate Key Errors

Use upsert mode instead of insert
Add conflict resolution rules
Check for duplicate data in source
Verify unique constraints are correct

Performance Issues

Increase batch size for bulk loads
Disable indexes during load
Use faster storage for temporary data
Reduce concurrent connections

Connection Failures

Verify database is accessible
Check credentials and permissions
Ensure connection limits aren't exceeded
Review firewall and network rules

Next Steps

Ingest external data: Data Ingestion
Build a workflow: Building and Configuring Workflows
Explore other models: Modelling Library

Modelling Library

Data Loader ​

Overview ​

Key Capabilities ​

Multi-Format Data Loading ​

Mock Data Generation ​

Data Transformation ​

Database Management ​

Use Cases ​

Development & Testing ​

Data Migration ​

Workflow Data Preparation ​

Prototype Development ​

Model Inputs ​

Model Outputs ​

Supported Operations ​

Data Import ​

Data Export ​

Mock Data Generation ​

Configuration Options ​

Loading Strategies ​

Append Mode ​

Replace Mode ​

Upsert Mode ​

Data Quality ​

Validation ​

Error Handling ​

Integration with Other Models ​

Performance Optimization ​

Bulk Loading ​

Indexing Strategy ​

Resource Management ​

Getting Started ​

Basic Workflow ​

Example: CSV to Database ​

Example: Mock Data Setup ​

Best Practices ​

Data Preparation ​

Performance ​

Reliability ​

Security ​

Troubleshooting ​

Load Fails with Type Errors ​

Duplicate Key Errors ​

Performance Issues ​

Connection Failures ​

Next Steps ​

Data Loader

Overview

Key Capabilities

Multi-Format Data Loading

Mock Data Generation

Data Transformation

Database Management

Use Cases

Development & Testing

Data Migration

Workflow Data Preparation

Prototype Development

Model Inputs

Model Outputs

Supported Operations

Data Import

Data Export

Mock Data Generation

Configuration Options

Loading Strategies

Append Mode

Replace Mode

Upsert Mode

Data Quality

Validation

Error Handling

Integration with Other Models

Performance Optimization

Bulk Loading

Indexing Strategy

Resource Management

Getting Started

Basic Workflow

Example: CSV to Database

Example: Mock Data Setup

Best Practices

Data Preparation

Performance

Reliability

Security

Troubleshooting

Load Fails with Type Errors

Duplicate Key Errors

Performance Issues

Connection Failures

Next Steps