An open-source tool to seed a development database with real data

#dataops

A bunch of contributors and myself have created RepliByte - an open-source tool to seed a development database from a production database.

Features 🔥

Support data backup and restore for PostgreSQL, MySQL, and MongoDB
Replace sensitive data with fake data
Works on large database (> 10GB) (read Design)
Database Subsetting: Scale down a production database to a more reasonable size
Start a local database with the prod data in a single command
On-the-fly data (de)compression (Zlib)
On-the-fly data de/encryption (AES-256)
Fully stateless (no server, no daemon) and lightweight binary
Use custom transformers

My motivation 🏃‍♂️

As a developer, creating a fake dataset for running tests is tedious. Plus, it does not reflect real-world data and is painful to keep updated. If you prefer to run your app tests with production data. Then RepliByte is for you as well.

Available for MacOSX, Linux, and Windows.

https://github.com/qovery/replibyte

Top comments (1)

milena Accuweb.cloud • Mar 19

Nice tool seeding dev databases with realistic data is always a challenge, especially when dealing with large datasets and sensitive information.

For teams working with MongoDB, tools like this can be really useful for creating staging or testing environments that closely match production while still masking sensitive data. One thing to keep in mind is that performance during backup/restore and dataset handling also depends a lot on the underlying infrastructure.

Running such workflows on scalable platforms like AccuWeb.Cloud can help handle large MongoDB datasets more efficiently, especially when dealing with database subsetting, compression, and frequent test environment resets. Having flexible storage and compute resources makes a big difference when working with data-heavy DevOps pipelines.

Curious to know if anyone here has tested it with large MongoDB clusters in production-like environments