INC-001 · zero-downtime-migrationRESOLVED

Moving a live datastore without losing a row

Decomposed a brownfield Django app with DDD and ran a feature-flagged dual-write migration from MongoDB to PostgreSQL — holding 99.9% availability through the cutover.

PythonDjangoPostgreSQLMongoDBFeature flagsDomain-Driven Design

Context

Dobare was already a live, revenue-bearing loyalty SaaS when its data layer became the constraint. The original Django codebase persisted to MongoDB, but the product had outgrown a document store: loyalty balances, transactions, and settlement needed real relational integrity, multi-row transactions, and the kind of constraints you can actually rely on under concurrency. The target was PostgreSQL. The catch was that the system could not stop — merchants were transacting against it daily.

Constraints

  • No maintenance window. A “migrate over the weekend” cutover was off the table; the platform had to keep serving reads and writes throughout.
  • No lost writes. Loyalty points are money-adjacent. A dropped write mid-migration is a customer dispute, not a log line.
  • Brownfield coupling. Domain logic was tangled across the app, so there was no single seam to swap the database behind.

Approach

The migration had to be earned in two phases: first make the change possible, then make it safe.

I decomposed the codebase into bounded contexts using Domain-Driven Design, so each domain — points, transactions, merchants, campaigns — owned its own data and could be migrated on its own schedule instead of in one high-risk big-bang flip.

On top of that I ran a feature-flagged dual-write: every write went to both MongoDB and PostgreSQL, while reads were shifted to Postgres gradually, one domain at a time, behind flags. A background reconciliation job backfilled history and continuously compared the two stores until they reached parity for a domain — only then did that domain’s reads flip, and only then did its Mongo write get retired.

app
 ├─▶ mongo      (old · read)
 └─▶ postgres   (new · dual-write)

  reconcile ──▶ flip read ──▶ retire mongo
  · rollback-safe at every step ·

Decision — dual-write over ETL-then-switch. A one-shot export/import would have needed a freeze to stay consistent. Dual-writing cost more code and a temporary double-write penalty, but it made the cutover reversible at any moment and removed the need for downtime. For money-adjacent data, reversibility was worth the overhead.

Decision — migrate by domain, not by table. Bounded contexts gave me natural migration units. Each domain could be validated against production traffic in isolation, so a problem in one never put the whole platform at risk.

Outcome

The cutover held 99.9% availability with no data loss and no maintenance window. Just as valuable as the storage swap: the codebase came out the other side decomposed into domain-aligned modules, which is what later made the modular-monolith architecture and per-domain scaling tractable at all.

What I’d revisit

The dual-write reconciliation logic was bespoke per domain. With hindsight I’d invest earlier in a single, generic compare-and-report harness — parity checking was the most repeated and most anxiety-inducing part of every domain cutover, and it deserved to be a first-class tool rather than something reassembled each time.

← all reports