🛠️ Advanced Configuration Guide

June 2025 • Shadow Shift Engineering Team

This guide covers Shadow Shift's powerful configuration options to create custom synthetic datasets with precise control over data structure, relationships, and distributions. Master these techniques to generate production-grade test data tailored to your specific needs.

📐 Schema Definition

Shadow Shift uses a JSON-based schema definition to configure data generation. The schema describes your data structure and generation rules:

Basic Schema

Advanced Schema

{
  "schema": {
    "users": {
      "fields": {
        "id": { "type": "uuid" },
        "name": { "type": "full_name" },
        "email": { "type": "email" },
        "signup_date": { 
          "type": "date",
          "range": ["2020-01-01", "2025-12-31"]
        }
      },
      "rows": 1000
    }
  }
}

{
  "schema": {
    "users": {
      "fields": {
        "id": { "type": "uuid", "primary_key": true },
        "account_type": { 
          "type": "enum",
          "values": ["free", "pro", "enterprise"],
          "distribution": [0.7, 0.2, 0.1]
        },
        "login_count": {
          "type": "integer",
          "min": 0,
          "max": 500,
          "distribution": "exponential"
        }
      },
      "rows": 5000,
      "relationships": [
        {
          "target": "orders",
          "type": "one_to_many",
          "field": "user_id"
        }
      ]
    }
  },
  "options": {
    "concurrency": 4,
    "batch_size": 1000
  }
}

🔡 Field Types & Options

Shadow Shift supports numerous field types with customizable parameters:

Common Field Types

string: Text with length constraints
integer/float: Numeric ranges and distributions
date/datetime: Time-based data with ranges
enum: Predefined value sets with custom distributions
regex: Pattern-generated values
reference: Relational data links

Example: Custom Distribution

"account_status": {
  "type": "enum",
  "values": ["active", "inactive", "suspended"],
  "distribution": [0.8, 0.15, 0.05] // 80% active, 15% inactive, 5% suspended
}

🔗 Data Relationships

Define complex relational data models with these relationship types:

Relationship Types

one_to_one: Direct record associations
one_to_many: Parent-child relationships
many_to_many: Junction table patterns
self_referencing: Hierarchical/tree data

"relationships": [
  {
    "target": "orders",
    "type": "one_to_many",
    "field": "user_id",
    "cardinality": {
      "min": 0,
      "max": 20,
      "distribution": "normal"
    }
  }
]

⚡ Performance Optimization

Configure these options for large-scale data generation:

Generation Options

"options": {
  "concurrency": 4,       // Parallel threads
  "batch_size": 1000,     // Rows per batch
  "memory_limit": "2GB",  // Memory cap
  "format": "ndjson",     // Output format
  "compression": "gzip"   // On-the-fly compression
}

🚀 Ready to Configure Your Perfect Dataset?

Put these advanced techniques into practice with Shadow Shift's intuitive schema designer.

Start Configuring →

🔧 Troubleshooting Tips

Validation Errors: Use strict: false to skip invalid data
Memory Issues: Reduce batch_size or enable compression
Slow Generation: Increase concurrency within system limits
Data Skew: Verify distribution sums to 1.0 for enum fields