Appearance
Data Loader
Overview
The Data Loader service provides flexible data loading capabilities for databases, supporting multiple data formats and sources. It enables rapid data ingestion from CSV, JSON, and SQL sources, plus mock data generation for testing and development.
This service is designed for data engineers, analysts, and developers who need to move data into databases quickly and reliably for workflows, testing, or production operations.
Key Capabilities
Multi-Format Data Loading
- CSV Import — Load data from comma-separated value files
- JSON Import — Ingest JSON documents and arrays
- SQL Execution — Run SQL scripts for complex loading logic
- Bulk Loading — Efficiently load large datasets
Mock Data Generation
- Synthetic Data — Generate realistic test data
- Schema-Based — Create data matching table schemas
- Configurable Volume — Generate from dozens to millions of rows
- Realistic Values — Names, addresses, dates, numbers follow realistic patterns
Data Transformation
- Type Conversion — Automatic conversion to database types
- Field Mapping — Rename columns during load
- Filtering — Load only rows matching criteria
- Enrichment — Add computed columns or default values
Database Management
- Table Creation — Create tables from data structure
- Schema Updates — Modify existing table schemas
- Index Management — Create indexes for query performance
- Data Export — Extract data to CSV or JSON
Use Cases
Development & Testing
Scenario: Developers need realistic test data in a database to validate application functionality.
Workflow:
- Define database schema
- Generate mock data matching schema
- Load into development database
- Run application tests against realistic data
Value: Test with realistic data without accessing production systems or manual data creation.
Data Migration
Scenario: Migrate data from legacy systems to new database schema.
Workflow:
- Export data from legacy system to CSV
- Define field mappings (old → new schema)
- Transform and validate data
- Load into new database
- Verify data integrity
Value: Streamline migration with automated transformation and validation.
Workflow Data Preparation
Scenario: A workflow needs specific data in a database to run models.
Workflow:
- Prepare data in CSV or JSON format
- Load data into workflow database
- Run models that query the database
- Export results for analysis
Value: Set up data dependencies automatically as part of workflow execution.
Prototype Development
Scenario: Rapidly prototype a new application feature that requires database interaction.
Workflow:
- Generate mock data for testing
- Load into local database
- Develop and test feature
- Iterate quickly without production data access
Value: Accelerate development with instant, realistic test data.
Model Inputs
The Data Loader service accepts:
- Data Source — CSV file, JSON document, or SQL script
- Database Connection — Connection details for target database
- Table Name — Where to load the data
- Field Mappings — How source fields map to database columns
- Loading Options — Batch size, conflict handling, validation rules
Model Outputs
The service produces:
- Load Status — Success/failure and row counts
- Error Messages — Details of any failures
- Data Validation Results — Quality checks and issues found
- Exported Data — For export operations, the resulting CSV or JSON
Supported Operations
Data Import
Load CSV
- Parse CSV files with headers
- Handle quoted fields and escaping
- Detect delimiters automatically
- Type inference for columns
Load JSON
- Import JSON arrays (one row per element)
- Handle nested objects (flatten or store as JSON)
- Batch processing for large files
Execute SQL
- Run INSERT statements
- Execute complex transformations
- Handle dependencies between statements
- Transaction support for atomicity
Data Export
Export to CSV
- Query database tables
- Apply filters and sorting
- Custom column selection
- Headers and formatting options
Export to JSON
- Convert query results to JSON
- Nested object support
- Streaming for large datasets
Mock Data Generation
Generate Tables
- Create data matching schema automatically
- Realistic values for common types (names, emails, dates)
- Configurable row counts
- Foreign key relationship handling
Custom Generators
- Define custom value generators
- Specify distributions for numeric data
- Pattern-based string generation
Configuration Options
Key parameters you can configure:
- Database Connection — Host, port, database name, credentials
- Table Name — Target table for loading
- Data Format — CSV, JSON, SQL
- Batch Size — Number of rows per transaction (performance tuning)
- Conflict Resolution — Ignore duplicates, update, or fail
- Validation Rules — Data quality checks before loading
- Type Mappings — Explicit type conversions
Loading Strategies
Append Mode
Add new rows to existing table:
- Preserves existing data
- Fast for incremental loads
- Risk of duplicates if not handled
Replace Mode
Delete existing data and load fresh:
- Ensures clean state
- Good for full refreshes
- Temporary table unavailability
Upsert Mode
Update existing rows, insert new ones:
- Handles changes efficiently
- Requires unique key definition
- More complex but flexible
Data Quality
Validation
Pre-load checks:
- Type Validation — Ensure values match column types
- Required Fields — Check for missing mandatory data
- Constraints — Verify foreign keys, uniqueness, ranges
- Format Checks — Validate emails, URLs, phone numbers
Error Handling
When validation fails:
- Stop on Error — Abort load, no changes
- Skip Invalid Rows — Load valid rows, log errors
- Best Effort — Load what's possible, report issues
Integration with Other Models
The Data Loader service works well with:
- Data Ingestion — Ingest external data, then load to database
- Schedule Generation — Load requirements data for optimization
- Traffic Model — Load network topology and demand data
- Any Model — Prepare database inputs for model workflows
Performance Optimization
Bulk Loading
- Use large batch sizes (1000-10000 rows)
- Disable indexes during load, rebuild after
- Use transactions for atomicity
- Parallel loading for large datasets
Indexing Strategy
- Drop indexes before large loads
- Rebuild indexes after load completes
- Create only necessary indexes
- Monitor query patterns to optimize indexes
Resource Management
- Limit concurrent connections
- Monitor memory usage for large loads
- Stream large files rather than loading into memory
- Clean up temporary resources
Getting Started
Basic Workflow
- Prepare Data — Format as CSV, JSON, or SQL
- Configure Connection — Set database credentials
- Define Mapping — Map source to target fields
- Add to Workflow — Drag Data Loader into canvas
- Load & Verify — Execute and check results
Example: CSV to Database
[Data Ingestion: Download CSV] → [Data Loader: Load CSV] → [Verify Data]This workflow downloads a CSV file and loads it into a database table.
Example: Mock Data Setup
[Data Loader: Generate Mock] → [Run Test Workflow] → [Clean Up]This workflow generates test data, runs a workflow against it, and cleans up afterward.
Best Practices
Data Preparation
- Validate Before Loading — Check data quality in source
- Clean Data — Remove or fix errors before load
- Consistent Encoding — Use UTF-8 for text
- Document Schema — Maintain clear schema documentation
Performance
- Batch Loading — Use large batches for efficiency
- Disable Constraints — Temporarily during bulk load
- Parallel Loading — Split large datasets across connections
- Monitor Progress — Log row counts and timing
Reliability
- Use Transactions — Ensure all-or-nothing loads
- Backup First — Protect existing data before replace operations
- Validate After Load — Verify row counts and data integrity
- Log Everything — Track what was loaded when
Security
- Secure Credentials — Don't hardcode database passwords
- Limit Permissions — Use accounts with minimal required access
- Sanitize Inputs — Prevent SQL injection from data
- Audit Loads — Track who loaded what when
Troubleshooting
Load Fails with Type Errors
- Check source data types match target schema
- Review type conversion rules
- Look for invalid values (nulls in NOT NULL columns)
- Verify date/time formats
Duplicate Key Errors
- Use upsert mode instead of insert
- Add conflict resolution rules
- Check for duplicate data in source
- Verify unique constraints are correct
Performance Issues
- Increase batch size for bulk loads
- Disable indexes during load
- Use faster storage for temporary data
- Reduce concurrent connections
Connection Failures
- Verify database is accessible
- Check credentials and permissions
- Ensure connection limits aren't exceeded
- Review firewall and network rules
Next Steps
- Ingest external data: Data Ingestion
- Build a workflow: Building and Configuring Workflows
- Explore other models: Modelling Library
