Moving a live datastore without losing a row

Decomposed a brownfield Django app with DDD and ran a feature-flagged dual-write migration from MongoDB to PostgreSQL — holding 99.9% availability through the cutover.

Context

Dobare was already a live, revenue-bearing loyalty SaaS when its data layer became the constraint. The original Django codebase persisted to MongoDB, but the product had outgrown a document store: loyalty balances, transactions, and settlement needed real relational integrity, multi-row transactions, and the kind of constraints you can actually rely on under concurrency. The target was PostgreSQL. The catch was that the system could not stop — merchants were transacting against it daily.

Constraints

No maintenance window. A “migrate over the weekend” cutover was off the table; the platform had to keep serving reads and writes throughout.
No lost writes. Loyalty points are money-adjacent. A dropped write mid-migration is a customer dispute, not a log line.
Brownfield coupling. Domain logic was tangled across the app, so there was no single seam to swap the database behind.

Approach

The migration had to be earned in two phases: first make the change possible, then make it safe.

I decomposed the codebase into bounded contexts using Domain-Driven Design, so each domain — points, transactions, merchants, campaigns — owned its own data and could be migrated on its own schedule instead of in one high-risk big-bang flip.

On top of that I ran a feature-flagged dual-write: every write went to both MongoDB and PostgreSQL, while reads were shifted to Postgres gradually, one domain at a time, behind flags. A background reconciliation job backfilled history and continuously compared the two stores until they reached parity for a domain — only then did that domain’s reads flip, and only then did its Mongo write get retired.

app
 ├─▶ mongo      (old · read)
 └─▶ postgres   (new · dual-write)
       │
  reconcile ──▶ flip read ──▶ retire mongo
  · rollback-safe at every step ·

Decision — dual-write over ETL-then-switch. A one-shot export/import would have needed a freeze to stay consistent. Dual-writing cost more code and a temporary double-write penalty, but it made the cutover reversible at any moment and removed the need for downtime. For money-adjacent data, reversibility was worth the overhead.

Decision — migrate by domain, not by table. Bounded contexts gave me natural migration units. Each domain could be validated against production traffic in isolation, so a problem in one never put the whole platform at risk.

Outcome

The cutover held 99.9% availability with no data loss and no maintenance window. Just as valuable as the storage swap: the codebase came out the other side decomposed into domain-aligned modules, which is what later made the modular-monolith architecture and per-domain scaling tractable at all.

What I’d revisit

The dual-write reconciliation logic was bespoke per domain. With hindsight I’d invest earlier in a single, generic compare-and-report harness — parity checking was the most repeated and most anxiety-inducing part of every domain cutover, and it deserved to be a first-class tool rather than something reassembled each time.