SerenAI
SerenAI
Show HN: Publisher-Subscriber PostgreSQL Replication with xmin and Rust

Show HN: Publisher-Subscriber PostgreSQL Replication with xmin and Rust

Taariq Lewis
Taariq Lewis
5 min read

TL;DR: We built an open-source Rust CLI that replicates PostgreSQL databases without requiring wal_level=logical. It uses PostgreSQL's xmin system column to detect changes, enabling CDC-style replication from any managed PostgreSQL service—no configuration changes needed.

GitHub: https://github.com/serenorg/database-replicator

The Problem

PostgreSQL's logical replication is powerful but has a frustrating prerequisite: wal_level=logical. Most managed PostgreSQL services (Neon, Heroku, many AWS RDS configurations) default to wal_level=replica, and changing it often requires a database restart or isn't available at all.

We needed replication for SerenDB that worked with any PostgreSQL source—regardless of how it was configured. The solution: leverage PostgreSQL's transaction visibility system instead of the WAL.

How xmin-Based Replication Works

Every PostgreSQL row has a hidden system column called xmin—the transaction ID that created or last modified that row. By tracking xmin values, we can identify which rows changed since our last sync without requiring any special database configuration.

Here's the core algorithm:

typescript
11. Record current max(xmin) as high_water_mark
22. SELECT * FROM table WHERE xmin > last_sync_xmin
33. UPSERT changed rows to target (ON CONFLICT DO UPDATE)
44. Store high_water_mark for next cycle
55. Periodically reconcile deletes via primary key comparison


The beauty is that xmin is always available—it's part of PostgreSQL's MVCC implementation. No extensions, no configuration, no special privileges beyond SELECT.

Five Technical Decisions That Made It Work

1. Rust for Reliability and Performance

We chose Rust for zero-cost abstractions and memory safety in a long-running daemon. The tokio-postgres crate provides async database access, and rust_decimal handles PostgreSQL's numeric type without precision loss. Type safety caught numerous edge cases at compile time that would have been runtime bugs in other languages.

typescript
1// Type-safe handling of 15+ PostgreSQL array types
2match data_type.as_str() {
3    "_text" | "_varchar" => {
4        let val: Option<Vec<String>> = row.get(idx);
5        Box::new(val)
6    }
7    "_int8" => {
8        let val: Option<Vec<i64>> = row.get(idx);
9        Box::new(val)
10    }
11    // ... handles _numeric, _jsonb, _timestamp, etc.
12}

2. Using udt_name Instead of data_type

A subtle but critical detail: PostgreSQL's information_schema.columns.data_type returns "ARRAY" for all array types. To get the actual element type (_text, _int4, _jsonb), you need udt_name. This single change fixed a class of serialization errors that had been causing sync failures.

3. Batched Upserts with Composite Primary Keys

Rather than individual INSERTs, we batch changes into multi-row upserts:

typescript
1INSERT INTO table (pk1, pk2, col1, col2)
2VALUES ($1, $2, $3, $4), ($5, $6, $7, $8), ...
3ON CONFLICT (pk1, pk2) DO UPDATE SET
4  col1 = EXCLUDED.col1,
5  col2 = EXCLUDED.col2

This reduces round-trips and handles both inserts and updates in one operation. Composite primary keys are fully supported.

4. Transaction ID Wraparound Detection

PostgreSQL's xmin is a 32-bit transaction ID that wraps around after ~4 billion transactions. We detect wraparound by checking if the current xmin is significantly lower than our stored high-water mark, triggering a full table resync when detected. Silent data loss from missed wraparound was a risk we couldn't accept.

5. Reconciliation for Delete Detection

xmin only tracks row modifications—it can't tell us about deletes. We solve this with periodic reconciliation: compare primary keys between source and target, delete any target rows missing from source. This runs on a configurable interval (default: daily) to balance consistency with performance.

Using It

typescript
1# Install
2cargo install database-replicator
3
4# Initial copy
5database-replicator init \
6  --source "postgresql://source-host/mydb" \
7  --target "postgresql://target-host/mydb"
8
9# Start continuous sync (auto-detects best method)
10database-replicator sync \
11  --source "postgresql://source-host/mydb" \
12  --target "postgresql://target-host/mydb"

If your source has wal_level=logical, it uses native logical replication. If not, it automatically falls back to xmin-based polling. Zero configuration required.

xmin vs WAL: Trade-offs and Costs

Choosing between xmin-based sync and native logical replication involves trade-offs in latency, resource usage, and operational complexity.

Performance Comparison

MetricWAL (Logical Replication)xmin Polling
LatencySub-second (streaming)Configurable (default: 1 second)
Source CPUMinimal (reads WAL)Higher (full table scans)
Source I/OWAL onlyData pages + indexes
Delete detectionImmediateRequires reconciliation
Configurationwal_level=logical + restart + always-on-computeNone

When to Use Each

Use WAL replication when:

  • You control the source database configuration
  • You need real-time or near-real-time sync
  • The source has wal_level=logical already enabled
  • You're replicating high-volume OLTP workloads

Use xmin polling when:

  • Source is a managed service without logical replication
  • Hourly/daily sync is acceptable
  • You can't restart the source database
  • You need zero-config replication from any PostgreSQL

Compute Costs (Continuous Sync)

For xmin-based sync running 24/7 on cloud infrastructure:

InstanceUse CaseMonthly Cost
t3.microLight sync (small DB)~$8
t3.smallMedium workloads~$15
t3.mediumHeavy sync (many tables)~$30

WAL replication has no dedicated compute cost—it runs within your existing PostgreSQL process. However, it increases WAL volume and may require larger storage.

Our Recommendation

Start with xmin polling (it works everywhere). If you need lower latency and can enable wal_level=logical, the tool automatically upgrades to native replication—no code changes required.

Fork It

The entire codebase is Apache 2.0 licensed. Key extension points:

  • src/xmin/reader.rs - Change detection logic
  • src/xmin/writer.rs - Type conversion and batched writes
  • src/xmin/daemon.rs - Sync orchestration and scheduling
  • src/xmin/reconciler.rs - Delete detection

We'd love contributions for: additional source databases, smarter batching strategies, or webhook notifications on sync events.

Links:

About SerenAI

SerenAI is building infrastructure for AI agent data access. Agents are hungry for data and they will pay to access the data in your database. We're creating the layer that powers secure, compliant enterprise data commerce and data delivery for AI agents. SerenAI includes agent identity verification, persistent memory via SerenDB, data access control, tiered data-access pricing, SOC2-ready compliance systems, as well as micropayments and settlement.
Our team brings decades of experience building enterprise databases and security systems. We believe AI agents need to pay to access your data.


Share:
Taariq Lewis

About Taariq Lewis

Exploring how to make developers faster and more productive with AI agents

Related Posts