🛠️ Advanced Configuration Guide
This guide covers Shadow Shift's powerful configuration options to create custom synthetic datasets with precise control over data structure, relationships, and distributions. Master these techniques to generate production-grade test data tailored to your specific needs.
📐 Schema Definition
Shadow Shift uses a JSON-based schema definition to configure data generation. The schema describes your data structure and generation rules:
Basic Schema
Advanced Schema
{
"schema": {
"users": {
"fields": {
"id": { "type": "uuid" },
"name": { "type": "full_name" },
"email": { "type": "email" },
"signup_date": {
"type": "date",
"range": ["2020-01-01", "2025-12-31"]
}
},
"rows": 1000
}
}
}
{
"schema": {
"users": {
"fields": {
"id": { "type": "uuid", "primary_key": true },
"account_type": {
"type": "enum",
"values": ["free", "pro", "enterprise"],
"distribution": [0.7, 0.2, 0.1]
},
"login_count": {
"type": "integer",
"min": 0,
"max": 500,
"distribution": "exponential"
}
},
"rows": 5000,
"relationships": [
{
"target": "orders",
"type": "one_to_many",
"field": "user_id"
}
]
}
},
"options": {
"concurrency": 4,
"batch_size": 1000
}
}
🔡 Field Types & Options
Shadow Shift supports numerous field types with customizable parameters:
Common Field Types
string
: Text with length constraintsinteger
/float
: Numeric ranges and distributionsdate
/datetime
: Time-based data with rangesenum
: Predefined value sets with custom distributionsregex
: Pattern-generated valuesreference
: Relational data links
Example: Custom Distribution
"account_status": {
"type": "enum",
"values": ["active", "inactive", "suspended"],
"distribution": [0.8, 0.15, 0.05] // 80% active, 15% inactive, 5% suspended
}
🔗 Data Relationships
Define complex relational data models with these relationship types:
Relationship Types
one_to_one
: Direct record associationsone_to_many
: Parent-child relationshipsmany_to_many
: Junction table patternsself_referencing
: Hierarchical/tree data
"relationships": [
{
"target": "orders",
"type": "one_to_many",
"field": "user_id",
"cardinality": {
"min": 0,
"max": 20,
"distribution": "normal"
}
}
]
⚡ Performance Optimization
Configure these options for large-scale data generation:
Generation Options
"options": {
"concurrency": 4, // Parallel threads
"batch_size": 1000, // Rows per batch
"memory_limit": "2GB", // Memory cap
"format": "ndjson", // Output format
"compression": "gzip" // On-the-fly compression
}
🚀 Ready to Configure Your Perfect Dataset?
Put these advanced techniques into practice with Shadow Shift's intuitive schema designer.
Start Configuring →🔧 Troubleshooting Tips
- Validation Errors: Use
strict: false
to skip invalid data - Memory Issues: Reduce
batch_size
or enable compression - Slow Generation: Increase
concurrency
within system limits - Data Skew: Verify distribution sums to 1.0 for enum fields