Data Engineering

Pipelines, warehouses, GPU-accelerated query engines, and big-data systems.

13 posts

All AI & Machine Learning Data Engineering Opinion

Streaming OLAP: The Post-Kafka Stack for Real-Time Analytics

The Kafka + Flink + ClickHouse/Pinot/Druid stack we built between 2018 and 2024 is fragmenting into three forks: single-engine streaming SQL, table-format-as-stream, and OLAP databases that eat the streaming layer entirely. Kafka isn't dying — it's becoming plumbing.

Jun 13, 2026Read

Data Engineering6 min read

What I Learned Writing GPU Kernels for SQL Aggregates

Three months, two abandoned designs, one breakthrough. The one-paragraph version: Apple Silicon GPUs don't have 64-bit atomic_fetch_add until very recent OS versions, and that single missing instruction shapes every other architectural decision in a Metal SQL aggregate engine.

Jun 6, 2026Read

Data Engineering5 min read

Multi-Aggregate Fusion: One Read, Four Answers

Every analytical engine treats SELECT SUM(x), MIN(x), MAX(x), COUNT(x) FROM t as four passes of the column. Fuse them into a single kernel and the speedup ratio against four-pass code becomes 9x to 25x. Here's why the technique works, and the data shape where it doesn't.

May 30, 2026Read

Data Engineering6 min read

Apple Silicon's Unified Memory Is the Quiet Revolution in Analytical Compute

M3 Ultra ships 512 GB of memory at 819 GB/s, addressable by the GPU with zero PCIe transfer cost. Every GPU database project from the past decade was architected around the assumption that memory bandwidth came at PCIe-tax prices. That assumption is now wrong on a fifth of the developer laptops in the world.

May 23, 2026Read

Data Engineering16 min read

samkhya v1.0: Plug Claude, GPT-4o-mini, or Local Ollama Into Your SQL Query Optimizer

samkhya v1.0 ships an LLM-pluggable corrector backend for embedded analytical engines — DataFusion, DuckDB, Polars, Postgres, Iceberg, gpudb. Plug Claude, GPT-4o-mini, or local Ollama into the cardinality-estimation slot via a simple HTTP wire contract (Python FastAPI and Node TypeScript reference servers ship in the box). Every LLM output is clamped from above by a provable pessimistic ceiling (LpJoinBound — 40.95× tighter than the 2008 AGM bound) so the LLM can never make your plan worse than the engine's native estimate. Transport-floor latency measured at P95 0.07–0.11 ms; live-LLM end-to-end cells honestly marked PROJECTED pending API budget.

May 17, 2026Read

Data Engineering27 min read

Why I built a GPU SQL engine in 2026 — when every other one died

Every standalone GPU database built between 2013 and 2024 was acqui-hired or pivoted. So why ship gpudb in 2026? Because nobody had wired Apple Silicon's unified memory into a SQL engine — and DuckDB hands you a hundred-thousand-user distribution channel without writing a database from scratch.

May 9, 2026Read

Data Engineering5 min read

Databricks vs Snowflake vs The New Wave: The Data Engineering Paradigm Shift

Snowflake just posted $4.68B in FY26 revenue at 29% growth. Databricks crossed $5.4B ARR in February at 65% growth. And neither chart explains why the most interesting data infrastructure being shipped in 2026 is single-process, embeddable, and runs on a laptop.

Apr 27, 2026Read

Data Engineering10 min read

Iceberg's Puffin Sidecars: Portable Stats for the Open Lakehouse

Apache Iceberg's Puffin file format is the most strategically important subsystem nobody is talking about. It is the mechanism by which an open lakehouse can carry warehouse-grade statistics across vendors — write the sketch once in Trino, read it tomorrow in Snowflake, plan a join correctly on the first cold query.

Mar 18, 2026Read

Data Engineering6 min read

DuckDB Ate the Modern Data Stack

An embedded analytical engine with no servers, no cluster, no migration cost just quietly displaced Spark for small data and Snowflake XS for medium data. MotherDuck closed Series B at a $400M post-money. Here's the part everyone undercounts.

Feb 17, 2026Read

Data Engineering5 min read

Iceberg, Delta, Hudi: Pick One in 2026 and Move On

The table-format wars are functionally over. Iceberg won on interop. Delta won on installed base. Hudi won on streaming upserts. The decision tree for a new project in 2026 is shorter than the comparison-blog industry wants you to believe.

Feb 12, 2026Read

Data Engineering9 min read

Polars vs DuckDB in 2026: When To Pick Which

Polars ate Pandas. DuckDB ate everything below the warehouse. The 2023 expectation was a cage match between two in-process analytical engines — the 2026 reality is they ate different cake, and the decision is mostly about whether your team thinks in DataFrames or SQL.

Jan 8, 2026Read

Data Engineering10 min read

Vector Indexes in OLAP Engines: 2025 Is Where Search Ate Analytics

DuckDB, ClickHouse, Snowflake, BigQuery, Postgres — by late 2025 every serious analytical engine ships a native vector index. That wasn't an AI-hype reflex. It was the realization that embedding search is just a column scan with a different distance function, and the warehouse-plus-vector-DB split was operational waste for the 90% case.

Oct 20, 2025Read

Data Engineering10 min read

Apache Arrow IPC vs JSON: The Numbers Behind the Switch

Most data-API traffic in 2025 still moves as JSON because humans need to read it. But for any system actually shipping columnar batches between services — analytical pipelines, feature stores, embedding services, MCP-style tool calls — Arrow IPC is 3-30× faster end-to-end. Honest accounting of when the switch pays off and when JSON is still correct.

Aug 15, 2025Read