The Ops Community ⚙️

Romaric Philogène
Romaric Philogène

Posted on

An open-source tool to seed a development database with real data

A bunch of contributors and myself have created RepliByte - an open-source tool to seed a development database from a production database.

Features 🔥

  • Support data backup and restore for PostgreSQL, MySQL, and MongoDB
  • Replace sensitive data with fake data
  • Works on large database (> 10GB) (read Design)
  • Database Subsetting: Scale down a production database to a more reasonable size
  • Start a local database with the prod data in a single command
  • On-the-fly data (de)compression (Zlib)
  • On-the-fly data de/encryption (AES-256)
  • Fully stateless (no server, no daemon) and lightweight binary
  • Use custom transformers

My motivation 🏃‍♂️

As a developer, creating a fake dataset for running tests is tedious. Plus, it does not reflect real-world data and is painful to keep updated. If you prefer to run your app tests with production data. Then RepliByte is for you as well.

Available for MacOSX, Linux, and Windows.

https://github.com/qovery/replibyte

Top comments (1)

Collapse
 
milena_c_2beafb4dd447818b profile image
milena Accuweb.cloud

Nice tool seeding dev databases with realistic data is always a challenge, especially when dealing with large datasets and sensitive information.

For teams working with MongoDB, tools like this can be really useful for creating staging or testing environments that closely match production while still masking sensitive data. One thing to keep in mind is that performance during backup/restore and dataset handling also depends a lot on the underlying infrastructure.

Running such workflows on scalable platforms like AccuWeb.Cloud can help handle large MongoDB datasets more efficiently, especially when dealing with database subsetting, compression, and frequent test environment resets. Having flexible storage and compute resources makes a big difference when working with data-heavy DevOps pipelines.

Curious to know if anyone here has tested it with large MongoDB clusters in production-like environments