Show HN: Publisher-Subscriber PostgreSQL Replication with xmin and Rust
TL;DR: We built an open-source Rust CLI that replicates PostgreSQL databases without requiring wal_level=logical. It uses PostgreSQL's xmin system column to detect changes, enabling CDC-style replication from any managed PostgreSQL service—no configuration changes needed.
GitHub: https://github.com/serenorg/database-replicator
The Problem
PostgreSQL's logical replication is powerful but has a frustrating prerequisite: wal_level=logical. Most managed PostgreSQL services (Neon, Heroku, many AWS RDS configurations) default to wal_level=replica, and changing it often requires a database restart or isn't available at all.
We needed replication for SerenDB that worked with any PostgreSQL source—regardless of how it was configured. The solution: leverage PostgreSQL's transaction visibility system instead of the WAL.
How xmin-Based Replication Works
Every PostgreSQL row has a hidden system column called xmin—the transaction ID that created or last modified that row. By tracking xmin values, we can identify which rows changed since our last sync without requiring any special database configuration.
Here's the core algorithm:
11. Record current max(xmin) as high_water_mark
22. SELECT * FROM table WHERE xmin > last_sync_xmin
33. UPSERT changed rows to target (ON CONFLICT DO UPDATE)
44. Store high_water_mark for next cycle
55. Periodically reconcile deletes via primary key comparisonThe beauty is that xmin is always available—it's part of PostgreSQL's MVCC implementation. No extensions, no configuration, no special privileges beyond SELECT.
Five Technical Decisions That Made It Work
1. Rust for Reliability and Performance
We chose Rust for zero-cost abstractions and memory safety in a long-running daemon. The tokio-postgres crate provides async database access, and rust_decimal handles PostgreSQL's numeric type without precision loss. Type safety caught numerous edge cases at compile time that would have been runtime bugs in other languages.
1// Type-safe handling of 15+ PostgreSQL array types
2match data_type.as_str() {
3 "_text" | "_varchar" => {
4 let val: Option<Vec<String>> = row.get(idx);
5 Box::new(val)
6 }
7 "_int8" => {
8 let val: Option<Vec<i64>> = row.get(idx);
9 Box::new(val)
10 }
11 // ... handles _numeric, _jsonb, _timestamp, etc.
12}2. Using udt_name Instead of data_type
A subtle but critical detail: PostgreSQL's information_schema.columns.data_type returns "ARRAY" for all array types. To get the actual element type (_text, _int4, _jsonb), you need udt_name. This single change fixed a class of serialization errors that had been causing sync failures.
3. Batched Upserts with Composite Primary Keys
Rather than individual INSERTs, we batch changes into multi-row upserts:
1INSERT INTO table (pk1, pk2, col1, col2)
2VALUES ($1, $2, $3, $4), ($5, $6, $7, $8), ...
3ON CONFLICT (pk1, pk2) DO UPDATE SET
4 col1 = EXCLUDED.col1,
5 col2 = EXCLUDED.col2This reduces round-trips and handles both inserts and updates in one operation. Composite primary keys are fully supported.
4. Transaction ID Wraparound Detection
PostgreSQL's xmin is a 32-bit transaction ID that wraps around after ~4 billion transactions. We detect wraparound by checking if the current xmin is significantly lower than our stored high-water mark, triggering a full table resync when detected. Silent data loss from missed wraparound was a risk we couldn't accept.
5. Reconciliation for Delete Detection
xmin only tracks row modifications—it can't tell us about deletes. We solve this with periodic reconciliation: compare primary keys between source and target, delete any target rows missing from source. This runs on a configurable interval (default: daily) to balance consistency with performance.
Using It
1# Install
2cargo install database-replicator
3
4# Initial copy
5database-replicator init \
6 --source "postgresql://source-host/mydb" \
7 --target "postgresql://target-host/mydb"
8
9# Start continuous sync (auto-detects best method)
10database-replicator sync \
11 --source "postgresql://source-host/mydb" \
12 --target "postgresql://target-host/mydb"If your source has wal_level=logical, it uses native logical replication. If not, it automatically falls back to xmin-based polling. Zero configuration required.
xmin vs WAL: Trade-offs and Costs
Choosing between xmin-based sync and native logical replication involves trade-offs in latency, resource usage, and operational complexity.
Performance Comparison
| Metric | WAL (Logical Replication) | xmin Polling |
|---|---|---|
| Latency | Sub-second (streaming) | Configurable (default: 1 second) |
| Source CPU | Minimal (reads WAL) | Higher (full table scans) |
| Source I/O | WAL only | Data pages + indexes |
| Delete detection | Immediate | Requires reconciliation |
| Configuration | wal_level=logical + restart + always-on-compute | None |
When to Use Each
Use WAL replication when:
- You control the source database configuration
- You need real-time or near-real-time sync
- The source has
wal_level=logicalalready enabled - You're replicating high-volume OLTP workloads
Use xmin polling when:
- Source is a managed service without logical replication
- Hourly/daily sync is acceptable
- You can't restart the source database
- You need zero-config replication from any PostgreSQL
Compute Costs (Continuous Sync)
For xmin-based sync running 24/7 on cloud infrastructure:
| Instance | Use Case | Monthly Cost |
|---|---|---|
| t3.micro | Light sync (small DB) | ~$8 |
| t3.small | Medium workloads | ~$15 |
| t3.medium | Heavy sync (many tables) | ~$30 |
WAL replication has no dedicated compute cost—it runs within your existing PostgreSQL process. However, it increases WAL volume and may require larger storage.
Our Recommendation
Start with xmin polling (it works everywhere). If you need lower latency and can enable wal_level=logical, the tool automatically upgrades to native replication—no code changes required.
Fork It
The entire codebase is Apache 2.0 licensed. Key extension points:
src/xmin/reader.rs- Change detection logicsrc/xmin/writer.rs- Type conversion and batched writessrc/xmin/daemon.rs- Sync orchestration and schedulingsrc/xmin/reconciler.rs- Delete detection
We'd love contributions for: additional source databases, smarter batching strategies, or webhook notifications on sync events.
Links:
- GitHub: https://github.com/serenorg/database-replicator
- Crates.io: https://crates.io/crates/database-replicator
- SerenDB: https://serendb.com
About SerenAI
SerenAI is building infrastructure for AI agent data access. Agents are hungry for data and they will pay to access the data in your database. We're creating the layer that powers secure, compliant enterprise data commerce and data delivery for AI agents. SerenAI includes agent identity verification, persistent memory via SerenDB, data access control, tiered data-access pricing, SOC2-ready compliance systems, as well as micropayments and settlement.
Our team brings decades of experience building enterprise databases and security systems. We believe AI agents need to pay to access your data.

About Taariq Lewis
Exploring how to make developers faster and more productive with AI agents
Related Posts

Web Scraping Goes Autonomous: Firecrawl Launches on SerenAI x402
AI agents can now scrape the entire web with USDC micropayments. Firecrawl on SerenAI x402 transforms any website into LLM-ready markdown.
Rust Cron Scheduler with x402 Payment Delegation on Cloudflare Workers
A Rust-based scheduler compiled to WASM that lets AI agents run recurring jobs with automatic micropayments—no interactive signing required.

How to Use SerenAI to Find Your Next Job
If you got laid off this year and you're still looking for work, this guide is for you: SerenAI's x402 MCP Server gives AI agents recruiting power to find your next great job.
