Wrestling with LLMs: The Right Tool for the Job

3/5/2026 · 9 min read · 1,871 words

I don't know Zig. I've never written a Zig program. But this afternoon I rewrote a multi-runtime PostgreSQL change tracking system as a single Zig binary: 3.7 MB, zero dependencies, 150x less memory. It took about two hours.

What Bemi Does

Bemi is an open source tool for automatic database change tracking. It connects to PostgreSQL's Write-Ahead Log and captures every INSERT, UPDATE, DELETE, and TRUNCATE. Audit trails, time travel, debugging. The use cases are real.

The architecture is not. Bemi chains four processes across three runtimes:

PostgreSQL WAL ──> Debezium (Java) ──> NATS (Go) ──> Worker (Node.js) ──> PostgreSQL

Debezium reads the WAL and publishes to NATS. NATS queues messages. A Node.js worker consumes from NATS and writes to PostgreSQL. Three languages, three runtimes, four processes, a 3.2 GB Docker image. For what is fundamentally a single-producer, single-consumer pipeline.

The JVM alone takes 30-60 seconds to start and 300+ MB of memory at rest. NATS adds JetStream configuration, stream management, and consumer groups, all operational complexity for a pipeline that has exactly one input and one output. The Node.js worker pulls in MikroORM, pnpm, and a full npm dependency tree to do what amounts to JSON serialization and a SQL INSERT.

I was using Bemi at work to track critical user journeys for accessibility validation, a greenfield project where change data capture made real sense. It worked. But the dev experience was slow, and when I went to demo it, the startup time killed the room. Waiting 30-60 seconds for the JVM to warm up before anything happens gives a terrible first impression. The idea was sound. The implementation was three runtimes in a docker-shaped box.

Why Zig

The problem is simple: read bytes from one PostgreSQL connection, transform them, write bytes to another PostgreSQL connection. No HTTP server. No user interface. No plugin system. Just a tight loop over a binary protocol.

This is a systems programming problem. You want:

Predictable memory usage (no GC pauses, no heap growth)
A single static binary (no runtime, no dependencies)
Direct control over TCP connections and byte parsing
Cross-compilation to every target from one machine

I knew about Zig because of Ghostty, the terminal emulator built by Mitchell Hashimoto, the founder of HashiCorp where I work. His writing about Zig sold me on the language's design. I'd been following it. I liked the C interop story, the developer friendliness, and the way it shared many of Rust's goals with less rigidity and ceremony. It seemed like exactly the right fit for this kind of byte-level protocol work.

What sealed it was the standard library. Zig's stdlib is unusually comprehensive for a systems language. The things this project needs, things that in C or Rust would mean pulling in external dependencies, are just there:

TLS: std.crypto.tls.Client gives you TLS 1.2/1.3 with no OpenSSL dependency. The entire SSL implementation for PostgreSQL connections is stdlib. No linking, no vendoring, no pkg-config.
Cryptography: HMAC-SHA-256, PBKDF2, MD5, SHA-256 all in stdlib. The SCRAM-SHA-256 authentication (RFC 5802) uses zero external packages. In most languages, that's at least one or two crypto library dependencies.
Binary protocol encoding: PostgreSQL's wire protocol is big-endian. std.mem.readInt and writeInt with .big handle that in one call instead of manual byte shuffling.
Atomics: std.atomic.Value for thread-safe Prometheus metrics counters. No mutex, no external concurrency library.
Cross-compilation: Four targets (x86_64/aarch64, Linux/macOS) from a single build.zig. No toolchain setup, no CI matrix of build machines.
Leak detection: std.testing.allocator catches memory leaks in every unit test automatically.

The result: 5,500 lines of Zig, zero external dependencies, a 3.7 MB static binary, a 1 MB Docker image. Memory usage is 2.8 MB at runtime. It starts in under a millisecond.

But I don't know Zig. I've never written a line of it.

Before AI: You Use What You Know

Here's how this decision used to work. You have a problem that's best solved in language X. But you know language Y. Learning X takes weeks. Shipping in Y takes days. You ship in Y.

This is rational. The cost of learning a new language isn't just syntax. It's the standard library, the build system, the debugging tools, the ecosystem conventions, the edge cases you only learn by shipping real code. For a side project, that cost almost never pencils out.

So you end up with JavaScript doing systems programming. Ruby doing real-time processing. Python doing everything because it's good enough at everything. Not because these are the right tools, but because they're the tools you already own.

The original Bemi is a perfect example. It uses Debezium (Java) for WAL reading because Debezium exists and is well-known. It uses NATS (Go) for message passing because NATS exists and is well-known. It uses Node.js for the worker because the team knows JavaScript. Each choice makes sense in isolation. Together, they produce a 3.2 GB image that needs 500 MB of RAM to do what a 3.7 MB binary can do in 2.8 MB.

After AI: You Use What's Right

A coding agent doesn't care what language I know. It knows all of them. When I said "implement the PostgreSQL logical replication protocol in Zig," it didn't ask me to learn Zig first. It just wrote the code.

I can follow the protocol flow: startup, authentication, replication slot management, WAL streaming, pgoutput decoding, change persistence. I understand what the code does. But I'd be lying if I said I understand all the Zig. The allocator patterns, the comptime mechanics, the error handling conventions: a lot of it is still opaque to me. I can read it well enough to review, but I couldn't write it from scratch.

That's an uncomfortable place to be. But it's also why the tests matter so much (more on that below).

What I didn't have to do:

Learn Zig's build system (zig build, build.zig, cross-compilation targets)
Learn Zig's standard library (ArrayList, allocators, TCP sockets)
Debug Zig-specific memory management patterns
Figure out Zig's testing framework
Write a PostgreSQL protocol parser from scratch in an unfamiliar language

Each of those would have been a day or more. Together, they would have killed the project. I would have reached for JavaScript, shipped something bigger and slower, and moved on.

Instead, I spent two hours reviewing, testing, and iterating on code that a coding agent wrote in the right language for the problem.

What Two Hours Looks Like

I had the coding agent TDD the whole thing against a real PostgreSQL instance from the start. Write a test, make it pass, write the next test. The E2E tests weren't an afterthought; they were the driver. Each passing test was early validation that the approach was working, and the speed of progress was immediately obvious.

Two hours got me to a working state: the core replication pipeline, E2E tests, CI with cross-compilation for four targets. Six commits.

Getting to something deployable took another day. SSL/TLS support (using Zig's built-in TLS client, no OpenSSL), SCRAM-SHA-256 authentication (a full RFC 5802 implementation; the original Bemi only supported MD5), graceful shutdown, automatic reconnection, Prometheus metrics, and tracking down a memory leak. The kind of production hardening that separates "it works on my machine" from "I'd run this at work." By the end of that day: 32 commits, 5,500 lines of Zig, 58 unit tests, 46 E2E assertions, a PostgreSQL version matrix (14 through 17), and throughput benchmarks proving 12-21x speed improvements over the original.

Testing the Real Thing

I've written before about how mocked tests give you green checkmarks and broken software. This project was no different. The only way to know that a PostgreSQL replication client works is to replicate from a real PostgreSQL server.

The E2E suite spins up three PostgreSQL instances (MD5 auth, SCRAM-SHA-256, and SSL), creates tables, and runs Zemi against them. Fifteen test groups with 46 assertions verify the things that actually matter:

INSERT, UPDATE, DELETE, and TRUNCATE all produce the correct change records
Multi-table tracking works across different tables
Context stitching works (application metadata gets attached to the right change)
REPLICA IDENTITY FULL captures before-values on UPDATE
The changes table doesn't trigger recursive tracking (no infinite loop)
SCRAM-SHA-256 authentication connects and tracks changes
SSL in three modes: require, verify-ca, and verify-full
Storage reconnection recovers automatically
Graceful shutdown cleans up replication slots
Table filtering only tracks specified tables
Prometheus metrics endpoint returns valid data

Every assertion queries the changes table and checks that real data arrived. Not "did the decoder parse this byte correctly?" but "did the INSERT I just ran in PostgreSQL show up in the changes table with the right before/after values?"

This is a project in a language I'm new to. A lot of the Zig is still opaque to me. The tests are the only reason I trust it. If the E2E suite passes, I know the binary does what the original Bemi does, regardless of whether I can spot a bug in Zig's allocator patterns by reading the source. Manual testing and seeing it work gives me more confidence than "I read the code and it looks right to me" would give in a language I've never written.

CI runs the full suite on every push: 58 unit tests, formatting checks, cross-compilation for four targets, E2E against PostgreSQL 14 through 17, and a Docker build check. If any of it fails, nothing ships.

The Numbers

Metric	Original Bemi	Zemi	Improvement
Docker image	3.23 GB	1.04 MB	3,100x smaller
Memory (RSS)	300-500+ MB	2.8 MB	~150x less
Startup time	30-60 seconds	<1 ms	instant
Throughput	baseline	12-21x faster	measured in CI
p50 latency	977 ms	75 ms	13x faster
Processes	4 (sh, java, nats, node)	1	single process
Runtime deps	JRE, Node.js, NATS, pnpm, MikroORM	0	zero
Binary size	N/A (3 runtimes)	3.7 MB	single static binary

These aren't aspirational numbers. They're measured. docker images for the image size. ps -o rss= during active replication for memory. /usr/bin/time for startup.

The architecture went from this:

PostgreSQL WAL ──> Debezium (Java) ──> NATS (Go) ──> Worker (Node.js) ──> PostgreSQL

To this:

PostgreSQL WAL ──> zemi (single Zig binary) ──> PostgreSQL

What This Means

The interesting thing isn't that Zig is better than Java for this problem. That's obvious. A tight binary protocol loop doesn't need a JVM, a message broker, and an ORM.

The interesting thing is that I could act on that knowledge. Before AI, "Zig would be perfect for this" was an observation I'd file away and never act on. The gap between knowing the right tool and being able to use it was weeks of learning. Now it's hours of reviewing.

This changes which projects are worth doing. Not just "can I build this faster?" but "can I build this correctly?" The right language, the right architecture, the right tradeoffs, chosen for the problem, not for my resume.

Zemi isn't deployed yet. The tests pass, the benchmarks are real, but I haven't put it in front of production traffic. That's next. But the fact that I got from "this demo was embarrassing" to "here's a working replacement with E2E tests and CI" in an afternoon, in a language I've never written, is the point.

I'll still reach for Ruby or JavaScript when they're right. But when the problem is parsing binary protocols over TCP with predictable memory usage, I can reach for Zig and actually ship it.

This is part of the "Wrestling with LLMs" series. Previous posts: Managing Smarmy Clippys, Zen and the Art of Vibe Coding, The Developer's New Hats, The Parachute on the Dragster, The Staging Lights.