Show HN: Database-replicator: Controlled AI data-access

Show HN: Database-replicator – Give AI agents controlled access to your data without touching production.

TL;DR: We built an open-source CLI that replicates your databases to a separate PostgreSQL instance for AI workloads. Your production stays untouched. You control exactly which tables and schemas AI agents can access. If the AI initiative doesn't work out, just drop the replica—zero impact. GitHub | Crates.io

The Problem: AI Wants Your Data, But Production Is Sacred

Every AI and agentic workflow needs data. Your data. The data sitting in your production databases that your team has spent years stabilizing, optimizing, and protecting.

When someone proposes connecting AI agents to your production database, the reaction is predictable:

"Absolutely not."

And for good reason. Your database team has three legitimate concerns:

1. Don't touch our infrastructure. Production databases are sacred. They've been tuned, monitored, and hardened over years. Adding new, unproven AI workloads introduces unknown query patterns, unpredictable load, and potential instability. Stretched teams don't have bandwidth to babysit experimental AI initiatives.

2. We need absolute control over what AI sees. AI agents shouldn't have access to everything. Maybe they need customer orders but not payment details. Maybe product inventory but not employee records. Fine-grained access control isn't optional—it's mandatory.

3. We need to be able to walk away. AI initiatives fail. Priorities shift. When leadership decides to sunset the AI project, you need to cleanly disconnect without database archaeology, leftover connections, or orphaned permissions cluttering your production system.

The answer isn't to give AI direct access to production. It's replication.

The Solution: A Controlled Replica for AI Workloads

database-replicator (https://github.com/serenorg/database-replicator) creates a separate PostgreSQL instance that mirrors exactly what you want AI agents to see—nothing more, nothing less.

typescript

1# Replicate only the tables AI needs (continuous sync stays on unless you pass --no-sync)
2database-replicator init \
3  --source "postgresql://readonly@prod:5432/mydb" \
4  --target "postgresql://admin@ai-replica:5432/mydb" \
5  --include-tables "mydb.orders,mydb.products,mydb.inventory"

Your production database is read-only in this relationship. The replicator pulls data; it never pushes. Your DBA doesn't need to install anything on production except granting a read-only user REPLICATION privilege.

The AI replica stays in sync via PostgreSQL's native logical replication. Every INSERT, UPDATE, and DELETE flows automatically. Your AI agents query the replica while production runs undisturbed.

When the AI project ends? Drop the replica database. Delete the subscription. Done. Production never knew it was there.

Selective Replication: You Control What AI Sees

This is where it matters. You're not replicating everything—you're curating a dataset for AI consumption.

Include only what's needed:

bash

1database-replicator init \
2  --source $PROD \
3  --target $AI_REPLICA \
4  --include-tables "ecommerce.orders,ecommerce.products,inventory.stock_levels"

Exclude sensitive data:

bash

1database-replicator init \
2  --source $PROD \
3  --target $AI_REPLICA \
4  --exclude-tables "auth.users,billing.payment_methods,hr.employees"

Filter by time (only recent data):

bash

1# replication-config.toml
2[[databases.mydb.time_filters]]
3table = "orders"
4column = "created_at"
5last = "90 days"
6
7[[databases.mydb.table_filters]]
8table = "customers"
9where = "region = 'US' AND status = 'active'"

The filter predicates are applied at the source. You're not transferring data to the replica and then filtering—the sensitive data never leaves production.

For PostgreSQL 15+, filters are enforced in the publication's WHERE clause, meaning the replication stream itself only contains permitted rows.

Interactive Filtering (Default)

Don't want to manually specify every table? Interactive mode is enabled by default—no flags needed:

bash

1database-replicator init \
2  --source "mysql://readonly@prod:3306/mydb" \
3  --target "postgresql://admin@your-db.serendb.com:5432/mydb"

The tool connects to your source database, discovers all databases and tables, and presents a terminal UI:

bash

1Select databases to replicate:
2(Use arrow keys, Space to select, Enter to confirm)
3
4> [x] ecommerce
5  [x] analytics
6  [ ] staging
7  [ ] test_data
8
9Select tables to EXCLUDE from 'ecommerce':
10  [ ] orders
11  [ ] products
12  [x] debug_logs
13  [x] temp_cache
14
15========================================
16Replication Configuration Summary
17========================================
18
19Databases to replicate: 2
20  ✓ ecommerce
21  ✓ analytics
22
23Tables to exclude: 2
24  ✗ ecommerce.debug_logs
25  ✗ ecommerce.temp_cache
26
27Proceed with this configuration? [Y/n]:

This is particularly valuable when you're exploring a database you didn't design. Instead of guessing table names or writing complex exclusion lists, you browse what's available and click to select. The configuration is saved, so subsequent syncs use the same filters automatically.

For scripted or automated workflows, use --no-interactive or --yes to disable the prompts and rely on CLI filter flags instead.

Multi-Source Support

Not all your data lives in PostgreSQL. We support replicating from:

PostgreSQL → PostgreSQL (continuous sync via logical replication)
MySQL/MariaDB → PostgreSQL (snapshot with periodic refresh)
MongoDB → PostgreSQL (documents stored as JSONB)
SQLite → PostgreSQL (one-time or scheduled sync)

From MySQL:

typescript

1database-replicator init \
2  --source "mysql://readonly@mysql-prod:3306/ecommerce" \
3  --target "postgresql://admin@ai-replica:5432/ecommerce" \
4  --include-tables "ecommerce.orders,ecommerce.products,ecommerce.customers"

From MongoDB:

typescript

1# Use database.collection when filtering Mongo collections
2database-replicator init \
3  --source "mongodb://readonly@mongo-prod:27017/analytics" \
4  --target "postgresql://admin@ai-replica:5432/analytics" \
5  --include-tables "analytics.events,analytics.user_sessions"

From SQLite:

typescript

1database-replicator init \
2  --source "sqlite:///path/to/app.db" \
3  --target "postgresql://admin@ai-replica:5432/appdata" \
4  --local

This means you can consolidate data from multiple sources into a single PostgreSQL replica optimized for AI queries—without touching any of the source systems beyond read access.

Commercial Database Support: On Our Roadmap

We started with the leading open-source databases, but enterprise data lives everywhere. We're actively working on support for commercial databases:

Coming soon:

Oracle Database
Microsoft SQL Server
IBM Db2
SAP HANA
Teradata
Snowflake
Amazon Redshift
Google BigQuery
Azure Synapse Analytics
Databricks (Delta Lake)

Each of these will follow the same pattern: read-only access to the source, selective replication with filtering, continuous or scheduled sync to your AI-ready PostgreSQL replica.

We welcome forks and contributions for commercial database sources. If your organization needs replication from a specific commercial database, fork the repo and build it. We're happy to review PRs that add new source connectors, and we'll help with architecture questions. The goal is comprehensive coverage—wherever your data lives, you should be able to replicate it safely for AI workloads.

How It Works

For PostgreSQL sources, we use native logical replication:

Validate - Verify source has wal_level = logical and user has REPLICATION privilege. Check target can create subscriptions. No changes to source required.
Initial Snapshot - Parallel pg_dump of selected tables only. Schema and data transferred to replica.
Create Publication - Source publishes changes for specified tables. This is a read-only operation from the source's perspective.
Create Subscription - Replica subscribes to the publication. PostgreSQL handles continuous sync automatically.
Verify - Checksums confirm data integrity between source and replica.

The replica is always slightly behind (typically milliseconds to seconds). For AI workloads, this latency is irrelevant—agents don't need real-time data to analyze trends, generate reports, or answer questions.

SerenAI Cloud Execution

Running replication locally means your laptop needs to stay connected for hours during the initial sync. For large databases, that's impractical.

When replicating to SerenDB (our managed PostgreSQL for AI), the job runs on our infrastructure:

typescript

1export SEREN_API_KEY="your-key"  # from console.serendb.com
2database-replicator init \
3  --source "postgresql://readonly@your-prod:5432/db" \
4  --target "postgresql://admin@your-db.serendb.com:5432/db"

We provision a resilient cloud-worker, run the replication, stream progress to your terminal, and clean up when done. Your laptop can disconnect—the job continues.

For non-SerenDB targets, use --local and it runs on your machine.

Technical Details

Written in Rust for performance and reliability. Cross-compiled for Linux, macOS Intel, and macOS ARM.

Wraps pg_dump/pg_restore rather than reimplementing. These tools are battle-tested across millions of databases. We add retry logic, progress tracking, and credential handling.

Checkpoint system for resume support. Long-running initial syncs can be interrupted and resumed. Checkpoints include a fingerprint of your filter configuration—changing filters invalidates checkpoints to prevent data inconsistency.

TCP keepalives configured automatically for connections through load balancers. No more mysterious timeouts during large transfers.

Credentials never exposed in process arguments. We use .pgpass files with proper permissions, cleaned up automatically.

Limitations

PostgreSQL targets only - We replicate to PostgreSQL, supporting multiple source types
Logical replication requires PostgreSQL 10+ on source (12+ recommended)
DDL changes need manual handling - Logical replication doesn't capture schema changes
Read-only replicas - The replica receives data; it's not bidirectional sync

Getting Started

typescript

1# Install from crates.io
2cargo install database-replicator
3
4# Or download binaries from GitHub releases

Basic usage:

typescript

1# Validate prerequisites
2database-replicator validate \
3  --source "postgresql://readonly@prod:5432/mydb" \
4  --target "postgresql://admin@ai-replica:5432/mydb"
5
6# Create replica with continuous sync (enabled by default; add --no-sync to disable)
7database-replicator init \
8  --source "postgresql://readonly@prod:5432/mydb" \
9  --target "postgresql://admin@ai-replica:5432/mydb" \
10  --include-tables "mydb.orders,mydb.products,mydb.customers"
11
12# Monitor replication status
13database-replicator status \
14  --source "postgresql://readonly@prod:5432/mydb" \
15  --target "postgresql://admin@ai-replica:5432/mydb"
16
17# Verify data integrity
18database-replicator verify \
19  --source "postgresql://readonly@prod:5432/mydb" \
20  --target "postgresql://admin@ai-replica:5432/mydb" \
21  --include-tables "mydb.orders,mydb.products,mydb.customers"

Why We Built This

At SerenAI, we're building the worl's largest agentic-data marketplace where AI agents pay to access data from databases and from other agents. Our users have zero desire, or need, to migrate away from their existing infrastructure—they want to keep production exactly as it is while giving AI controlled access to specific data.

Replication solves this cleanly. Production stays untouched. AI gets a curated view. Teams maintain full control. And when priorities change, cleanup is trivial.

The tool is Apache 2.0 licensed. We'd love contributions, especially around real-time CDC for MySQL sources.

About SerenAI

SerenAI is building infrastructure for AI agent data access. Agents are hungry for data and they will pay to access the data in your database. We're creating the layer that powers secure, compliant enterprise data commerce and data delivery for AI agents. SerenAI includes agent identity verification, persistent memory via SerenDB, data access control, tiered data-access pricing, SOC2-ready compliance systems, as well as micropayments and settlement.

Our team brings decades of experience building enterprise databases and security systems. We believe AI agents need to pay to access your data.

Get in touch: hello@serendb.com | serendb.com

GitHub | Documentation | SerenAI Console

Happy to answer questions about the architecture, selective replication patterns, or anything else. Find us at http://serendb.com

Show HN: Database-replicator – Give AI agents controlled access to your data without touching production

Show HN: Database-replicator – Give AI agents controlled access to your data without touching production.

The Problem: AI Wants Your Data, But Production Is Sacred

The Solution: A Controlled Replica for AI Workloads

Selective Replication: You Control What AI Sees

Interactive Filtering (Default)

Multi-Source Support

Commercial Database Support: On Our Roadmap

How It Works

SerenAI Cloud Execution

Technical Details

Limitations

Getting Started

Why We Built This

About SerenAI

About Taariq Lewis

Related Posts

How-To Use Seren Skills So Your AI Agent Can Use Your Ledger Hardware Wallet

2026 Tax Season Winner: Free Crypto 1099-DA Reconciliation, Free SerenDB Storage, and CPA Support

How To Use Job Search Skills in Seren to Network & Get Hired