Publisher-Subscriber PostgreSQL Replication w/ xmin & Rust

TL;DR: We built an open-source Rust CLI that replicates PostgreSQL databases without requiring wal_level=logical. It uses PostgreSQL's xmin system column to detect changes, enabling CDC-style replication from any managed PostgreSQL service—no configuration changes needed.

GitHub: https://github.com/serenorg/database-replicator

The Problem

PostgreSQL's logical replication is powerful but has a frustrating prerequisite: wal_level=logical. Most managed PostgreSQL services (Neon, Heroku, many AWS RDS configurations) default to wal_level=replica, and changing it often requires a database restart or isn't available at all.

We needed replication for SerenDB that worked with any PostgreSQL source—regardless of how it was configured. The solution: leverage PostgreSQL's transaction visibility system instead of the WAL.

How xmin-Based Replication Works

Every PostgreSQL row has a hidden system column called xmin—the transaction ID that created or last modified that row. By tracking xmin values, we can identify which rows changed since our last sync without requiring any special database configuration.

Here's the core algorithm:

typescript

11. Record current max(xmin) as high_water_mark
22. SELECT * FROM table WHERE xmin > last_sync_xmin
33. UPSERT changed rows to target (ON CONFLICT DO UPDATE)
44. Store high_water_mark for next cycle
55. Periodically reconcile deletes via primary key comparison

The beauty is that xmin is always available—it's part of PostgreSQL's MVCC implementation. No extensions, no configuration, no special privileges beyond SELECT.

Five Technical Decisions That Made It Work

1. Rust for Reliability and Performance

We chose Rust for zero-cost abstractions and memory safety in a long-running daemon. The tokio-postgres crate provides async database access, and rust_decimal handles PostgreSQL's numeric type without precision loss. Type safety caught numerous edge cases at compile time that would have been runtime bugs in other languages.

typescript

1// Type-safe handling of 15+ PostgreSQL array types
2match data_type.as_str() {
3    "_text" | "_varchar" => {
4        let val: Option<Vec<String>> = row.get(idx);
5        Box::new(val)
6    }
7    "_int8" => {
8        let val: Option<Vec<i64>> = row.get(idx);
9        Box::new(val)
10    }
11    // ... handles _numeric, _jsonb, _timestamp, etc.
12}

2. Using `udt_name` Instead of `data_type`

A subtle but critical detail: PostgreSQL's information_schema.columns.data_type returns "ARRAY" for all array types. To get the actual element type (_text, _int4, _jsonb), you need udt_name. This single change fixed a class of serialization errors that had been causing sync failures.

3. Batched Upserts with Composite Primary Keys

Rather than individual INSERTs, we batch changes into multi-row upserts:

typescript

1INSERT INTO table (pk1, pk2, col1, col2)
2VALUES ($1, $2, $3, $4), ($5, $6, $7, $8), ...
3ON CONFLICT (pk1, pk2) DO UPDATE SET
4  col1 = EXCLUDED.col1,
5  col2 = EXCLUDED.col2

This reduces round-trips and handles both inserts and updates in one operation. Composite primary keys are fully supported.

4. Transaction ID Wraparound Detection

PostgreSQL's xmin is a 32-bit transaction ID that wraps around after ~4 billion transactions. We detect wraparound by checking if the current xmin is significantly lower than our stored high-water mark, triggering a full table resync when detected. Silent data loss from missed wraparound was a risk we couldn't accept.

5. Reconciliation for Delete Detection

xmin only tracks row modifications—it can't tell us about deletes. We solve this with periodic reconciliation: compare primary keys between source and target, delete any target rows missing from source. This runs on a configurable interval (default: daily) to balance consistency with performance.

Using It

typescript

1# Install
2cargo install database-replicator
3
4# Initial copy
5database-replicator init \
6  --source "postgresql://source-host/mydb" \
7  --target "postgresql://target-host/mydb"
8
9# Start continuous sync (auto-detects best method)
10database-replicator sync \
11  --source "postgresql://source-host/mydb" \
12  --target "postgresql://target-host/mydb"

If your source has wal_level=logical, it uses native logical replication. If not, it automatically falls back to xmin-based polling. Zero configuration required.

xmin vs WAL: Trade-offs and Costs

Choosing between xmin-based sync and native logical replication involves trade-offs in latency, resource usage, and operational complexity.

Performance Comparison

Metric	WAL (Logical Replication)	xmin Polling
Latency	Sub-second (streaming)	Configurable (default: 1 second)
Source CPU	Minimal (reads WAL)	Higher (full table scans)
Source I/O	WAL only	Data pages + indexes
Delete detection	Immediate	Requires reconciliation
Configuration	wal_level=logical + restart + always-on-compute	None

When to Use Each

Use WAL replication when:

You control the source database configuration
You need real-time or near-real-time sync
The source has wal_level=logical already enabled
You're replicating high-volume OLTP workloads

Use xmin polling when:

Source is a managed service without logical replication
Hourly/daily sync is acceptable
You can't restart the source database
You need zero-config replication from any PostgreSQL

Compute Costs (Continuous Sync)

For xmin-based sync running 24/7 on cloud infrastructure:

Instance	Use Case	Monthly Cost
t3.micro	Light sync (small DB)	~$8
t3.small	Medium workloads	~$15
t3.medium	Heavy sync (many tables)	~$30

WAL replication has no dedicated compute cost—it runs within your existing PostgreSQL process. However, it increases WAL volume and may require larger storage.

Our Recommendation

Start with xmin polling (it works everywhere). If you need lower latency and can enable wal_level=logical, the tool automatically upgrades to native replication—no code changes required.

Fork It

The entire codebase is Apache 2.0 licensed. Key extension points:

src/xmin/reader.rs - Change detection logic
src/xmin/writer.rs - Type conversion and batched writes
src/xmin/daemon.rs - Sync orchestration and scheduling
src/xmin/reconciler.rs - Delete detection

We'd love contributions for: additional source databases, smarter batching strategies, or webhook notifications on sync events.

Links:

GitHub: https://github.com/serenorg/database-replicator
Crates.io: https://crates.io/crates/database-replicator
SerenDB: https://serendb.com

About SerenAI

SerenAI is building infrastructure for AI agent data access. Agents are hungry for data and they will pay to access the data in your database. We're creating the layer that powers secure, compliant enterprise data commerce and data delivery for AI agents. SerenAI includes agent identity verification, persistent memory via SerenDB, data access control, tiered data-access pricing, SOC2-ready compliance systems, as well as micropayments and settlement.
Our team brings decades of experience building enterprise databases and security systems. We believe AI agents need to pay to access your data.

Show HN: Publisher-Subscriber PostgreSQL Replication with xmin and Rust

The Problem

How xmin-Based Replication Works

Five Technical Decisions That Made It Work

1. Rust for Reliability and Performance

2. Using `udt_name` Instead of `data_type`

3. Batched Upserts with Composite Primary Keys

4. Transaction ID Wraparound Detection

5. Reconciliation for Delete Detection

Using It

xmin vs WAL: Trade-offs and Costs

Performance Comparison

When to Use Each

Compute Costs (Continuous Sync)

Our Recommendation

Fork It

About SerenAI

About Taariq Lewis

Related Posts

2026 Tax Season Winner: Free Crypto 1099-DA Reconciliation, Free SerenDB Storage, and CPA Support

How To Use Job Search Skills in Seren to Network & Get Hired

How AI Agents Can Now Manage Your CRM: Introducing Attio on Seren MCP

The Problem

How xmin-Based Replication Works

Five Technical Decisions That Made It Work

1. Rust for Reliability and Performance

2. Using udt_name Instead of data_type

3. Batched Upserts with Composite Primary Keys

4. Transaction ID Wraparound Detection

5. Reconciliation for Delete Detection

Using It

xmin vs WAL: Trade-offs and Costs

Performance Comparison

When to Use Each

Compute Costs (Continuous Sync)

Our Recommendation

Fork It

About SerenAI

About Taariq Lewis

Related Posts

2026 Tax Season Winner: Free Crypto 1099-DA Reconciliation, Free SerenDB Storage, and CPA Support

How To Use Job Search Skills in Seren to Network & Get Hired

How AI Agents Can Now Manage Your CRM: Introducing Attio on Seren MCP

2. Using `udt_name` Instead of `data_type`