Show HN: Database-replicator – Give AI agents controlled access to your data without touching production
Show HN: Database-replicator – Give AI agents controlled access to your data without touching production.
TL;DR: We built an open-source CLI that replicates your databases to a separate PostgreSQL instance for AI workloads. Your production stays untouched. You control exactly which tables and schemas AI agents can access. If the AI initiative doesn't work out, just drop the replica—zero impact. GitHub | Crates.io
The Problem: AI Wants Your Data, But Production Is Sacred
Every AI and agentic workflow needs data. Your data. The data sitting in your production databases that your team has spent years stabilizing, optimizing, and protecting.
When someone proposes connecting AI agents to your production database, the reaction is predictable:
"Absolutely not."
And for good reason. Your database team has three legitimate concerns:
1. Don't touch our infrastructure. Production databases are sacred. They've been tuned, monitored, and hardened over years. Adding new, unproven AI workloads introduces unknown query patterns, unpredictable load, and potential instability. Stretched teams don't have bandwidth to babysit experimental AI initiatives.
2. We need absolute control over what AI sees. AI agents shouldn't have access to everything. Maybe they need customer orders but not payment details. Maybe product inventory but not employee records. Fine-grained access control isn't optional—it's mandatory.
3. We need to be able to walk away. AI initiatives fail. Priorities shift. When leadership decides to sunset the AI project, you need to cleanly disconnect without database archaeology, leftover connections, or orphaned permissions cluttering your production system.
The answer isn't to give AI direct access to production. It's replication.
The Solution: A Controlled Replica for AI Workloads
database-replicator (https://github.com/serenorg/database-replicator) creates a separate PostgreSQL instance that mirrors exactly what you want AI agents to see—nothing more, nothing less.
1# Replicate only the tables AI needs (continuous sync stays on unless you pass --no-sync)
2database-replicator init \
3 --source "postgresql://readonly@prod:5432/mydb" \
4 --target "postgresql://admin@ai-replica:5432/mydb" \
5 --include-tables "mydb.orders,mydb.products,mydb.inventory"Your production database is read-only in this relationship. The replicator pulls data; it never pushes. Your DBA doesn't need to install anything on production except granting a read-only user REPLICATION privilege.
The AI replica stays in sync via PostgreSQL's native logical replication. Every INSERT, UPDATE, and DELETE flows automatically. Your AI agents query the replica while production runs undisturbed.
When the AI project ends? Drop the replica database. Delete the subscription. Done. Production never knew it was there.
Selective Replication: You Control What AI Sees
This is where it matters. You're not replicating everything—you're curating a dataset for AI consumption.
Include only what's needed:
1database-replicator init \
2 --source $PROD \
3 --target $AI_REPLICA \
4 --include-tables "ecommerce.orders,ecommerce.products,inventory.stock_levels"Exclude sensitive data:
1database-replicator init \
2 --source $PROD \
3 --target $AI_REPLICA \
4 --exclude-tables "auth.users,billing.payment_methods,hr.employees"Filter by time (only recent data):
1# replication-config.toml
2[[databases.mydb.time_filters]]
3table = "orders"
4column = "created_at"
5last = "90 days"
6
7[[databases.mydb.table_filters]]
8table = "customers"
9where = "region = 'US' AND status = 'active'"The filter predicates are applied at the source. You're not transferring data to the replica and then filtering—the sensitive data never leaves production.
For PostgreSQL 15+, filters are enforced in the publication's WHERE clause, meaning the replication stream itself only contains permitted rows.
Interactive Filtering (Default)
Don't want to manually specify every table? Interactive mode is enabled by default—no flags needed:
1database-replicator init \
2 --source "mysql://readonly@prod:3306/mydb" \
3 --target "postgresql://admin@your-db.serendb.com:5432/mydb"The tool connects to your source database, discovers all databases and tables, and presents a terminal UI:
1Select databases to replicate:
2(Use arrow keys, Space to select, Enter to confirm)
3
4> [x] ecommerce
5 [x] analytics
6 [ ] staging
7 [ ] test_data
8
9Select tables to EXCLUDE from 'ecommerce':
10 [ ] orders
11 [ ] products
12 [x] debug_logs
13 [x] temp_cache
14
15========================================
16Replication Configuration Summary
17========================================
18
19Databases to replicate: 2
20 ✓ ecommerce
21 ✓ analytics
22
23Tables to exclude: 2
24 ✗ ecommerce.debug_logs
25 ✗ ecommerce.temp_cache
26
27Proceed with this configuration? [Y/n]:This is particularly valuable when you're exploring a database you didn't design. Instead of guessing table names or writing complex exclusion lists, you browse what's available and click to select. The configuration is saved, so subsequent syncs use the same filters automatically.
For scripted or automated workflows, use --no-interactive or --yes to disable the prompts and rely on CLI filter flags instead.
Multi-Source Support
Not all your data lives in PostgreSQL. We support replicating from:
- PostgreSQL → PostgreSQL (continuous sync via logical replication)
- MySQL/MariaDB → PostgreSQL (snapshot with periodic refresh)
- MongoDB → PostgreSQL (documents stored as JSONB)
- SQLite → PostgreSQL (one-time or scheduled sync)
From MySQL:
1database-replicator init \
2 --source "mysql://readonly@mysql-prod:3306/ecommerce" \
3 --target "postgresql://admin@ai-replica:5432/ecommerce" \
4 --include-tables "ecommerce.orders,ecommerce.products,ecommerce.customers"From MongoDB:
1# Use database.collection when filtering Mongo collections
2database-replicator init \
3 --source "mongodb://readonly@mongo-prod:27017/analytics" \
4 --target "postgresql://admin@ai-replica:5432/analytics" \
5 --include-tables "analytics.events,analytics.user_sessions"From SQLite:
1database-replicator init \
2 --source "sqlite:///path/to/app.db" \
3 --target "postgresql://admin@ai-replica:5432/appdata" \
4 --localThis means you can consolidate data from multiple sources into a single PostgreSQL replica optimized for AI queries—without touching any of the source systems beyond read access.
Commercial Database Support: On Our Roadmap
We started with the leading open-source databases, but enterprise data lives everywhere. We're actively working on support for commercial databases:
Coming soon:
- Oracle Database
- Microsoft SQL Server
- IBM Db2
- SAP HANA
- Teradata
- Snowflake
- Amazon Redshift
- Google BigQuery
- Azure Synapse Analytics
- Databricks (Delta Lake)
Each of these will follow the same pattern: read-only access to the source, selective replication with filtering, continuous or scheduled sync to your AI-ready PostgreSQL replica.
We welcome forks and contributions for commercial database sources. If your organization needs replication from a specific commercial database, fork the repo and build it. We're happy to review PRs that add new source connectors, and we'll help with architecture questions. The goal is comprehensive coverage—wherever your data lives, you should be able to replicate it safely for AI workloads.
How It Works
For PostgreSQL sources, we use native logical replication:
- Validate - Verify source has
wal_level = logicaland user has REPLICATION privilege. Check target can create subscriptions. No changes to source required. - Initial Snapshot - Parallel
pg_dumpof selected tables only. Schema and data transferred to replica. - Create Publication - Source publishes changes for specified tables. This is a read-only operation from the source's perspective.
- Create Subscription - Replica subscribes to the publication. PostgreSQL handles continuous sync automatically.
- Verify - Checksums confirm data integrity between source and replica.
The replica is always slightly behind (typically milliseconds to seconds). For AI workloads, this latency is irrelevant—agents don't need real-time data to analyze trends, generate reports, or answer questions.
SerenAI Cloud Execution
Running replication locally means your laptop needs to stay connected for hours during the initial sync. For large databases, that's impractical.
When replicating to SerenDB (our managed PostgreSQL for AI), the job runs on our infrastructure:
1export SEREN_API_KEY="your-key" # from console.serendb.com
2database-replicator init \
3 --source "postgresql://readonly@your-prod:5432/db" \
4 --target "postgresql://admin@your-db.serendb.com:5432/db"We provision a resilient cloud-worker, run the replication, stream progress to your terminal, and clean up when done. Your laptop can disconnect—the job continues.
For non-SerenDB targets, use --local and it runs on your machine.
Technical Details
Written in Rust for performance and reliability. Cross-compiled for Linux, macOS Intel, and macOS ARM.
Wraps pg_dump/pg_restore rather than reimplementing. These tools are battle-tested across millions of databases. We add retry logic, progress tracking, and credential handling.
Checkpoint system for resume support. Long-running initial syncs can be interrupted and resumed. Checkpoints include a fingerprint of your filter configuration—changing filters invalidates checkpoints to prevent data inconsistency.
TCP keepalives configured automatically for connections through load balancers. No more mysterious timeouts during large transfers.
Credentials never exposed in process arguments. We use .pgpass files with proper permissions, cleaned up automatically.
Limitations
- PostgreSQL targets only - We replicate to PostgreSQL, supporting multiple source types
- Logical replication requires PostgreSQL 10+ on source (12+ recommended)
- DDL changes need manual handling - Logical replication doesn't capture schema changes
- Read-only replicas - The replica receives data; it's not bidirectional sync
Getting Started
1# Install from crates.io
2cargo install database-replicator
3
4# Or download binaries from GitHub releasesBasic usage:
1# Validate prerequisites
2database-replicator validate \
3 --source "postgresql://readonly@prod:5432/mydb" \
4 --target "postgresql://admin@ai-replica:5432/mydb"
5
6# Create replica with continuous sync (enabled by default; add --no-sync to disable)
7database-replicator init \
8 --source "postgresql://readonly@prod:5432/mydb" \
9 --target "postgresql://admin@ai-replica:5432/mydb" \
10 --include-tables "mydb.orders,mydb.products,mydb.customers"
11
12# Monitor replication status
13database-replicator status \
14 --source "postgresql://readonly@prod:5432/mydb" \
15 --target "postgresql://admin@ai-replica:5432/mydb"
16
17# Verify data integrity
18database-replicator verify \
19 --source "postgresql://readonly@prod:5432/mydb" \
20 --target "postgresql://admin@ai-replica:5432/mydb" \
21 --include-tables "mydb.orders,mydb.products,mydb.customers"Why We Built This
At SerenAI, we're building the worl's largest agentic-data marketplace where AI agents pay to access data from databases and from other agents. Our users have zero desire, or need, to migrate away from their existing infrastructure—they want to keep production exactly as it is while giving AI controlled access to specific data.
Replication solves this cleanly. Production stays untouched. AI gets a curated view. Teams maintain full control. And when priorities change, cleanup is trivial.
The tool is Apache 2.0 licensed. We'd love contributions, especially around real-time CDC for MySQL sources.
About SerenAI
SerenAI is building infrastructure for AI agent data access. Agents are hungry for data and they will pay to access the data in your database. We're creating the layer that powers secure, compliant enterprise data commerce and data delivery for AI agents. SerenAI includes agent identity verification, persistent memory via SerenDB, data access control, tiered data-access pricing, SOC2-ready compliance systems, as well as micropayments and settlement.
Our team brings decades of experience building enterprise databases and security systems. We believe AI agents need to pay to access your data.
Get in touch: hello@serendb.com | serendb.com
GitHub | Documentation | SerenAI Console
Happy to answer questions about the architecture, selective replication patterns, or anything else. Find us at http://serendb.com

About Taariq Lewis
Exploring how to make developers faster and more productive with AI agents
Related Posts

How SerenAI Makes Healthcare Expert Marketplaces like MDisrupt AI-Native
Healthcare expert marketplaces like mDisrupt connect clients w/ medical consultants. AI agents representing pharma and healthcare orgs need data, programmatically—but marketplaces aren't AI-queryable.

Black Friday 2025: How WooCommerce Merchants Can Capture the First Wave of Agentic Shopping Revenue
Black Friday 2025: How WooCommerce Merchants Can Capture the First Wave of Agentic Shopping Revenue with SerenAI's data replication and agentic data-access commerce.
Five Ways SerenAI Delivers Zero-Downtime Redundancy When Agentic Downtime Is Unacceptable
On November 18, 2025, a single database permissions change at Cloudflare cascaded into a 3-hour global outage affecting millions of websites, APIs, and services.
