Research, Build,
Story

samkhya v1.0 ships an LLM-pluggable corrector backend for embedded analytical engines — DataFusion, DuckDB, Polars, Postgres, Iceberg, gpudb. Plug Claude, GPT-4o-mini, or local Ollama into the cardinality-estimation slot via a simple HTTP wire contract (Python FastAPI and Node TypeScript reference servers ship in the box). Every LLM output is clamped from above by a provable pessimistic ceiling (LpJoinBound — 40.95× tighter than the 2008 AGM bound) so the LLM can never make your plan worse than the engine's native estimate. Transport-floor latency measured at P95 0.07–0.11 ms; live-LLM end-to-end cells honestly marked PROJECTED pending API budget.

Latest Posts

View all

AI & Machine Learning7 min read

On-Device AI Just Got Real

For three years, on-device AI was a demo that almost worked. In June 2026 it stopped being one. Sparse models like Apple's AFM 3 and Google's Gemma 4 made intelligence large in flash, small in motion, free to run, and offline by default.

Jun 27, 2026Read

Opinion10 min read

The Coding-Agent Arms Race: Who Survives the H1-2026 Shakeout

In six months, AI coding agents went from features to a brutal platform war: $26B startups, a new frontier model every six weeks, pricing whiplash, and a reverse-acquihire that gutted a unicorn. The agent you build on is now a strategic bet.

Jun 20, 2026Read

Data Engineering10 min read

Streaming OLAP: The Post-Kafka Stack for Real-Time Analytics

The Kafka + Flink + ClickHouse/Pinot/Druid stack we built between 2018 and 2024 is fragmenting into three forks: single-engine streaming SQL, table-format-as-stream, and OLAP databases that eat the streaming layer entirely. Kafka isn't dying — it's becoming plumbing.

Jun 13, 2026Read

Data Engineering6 min read

What I Learned Writing GPU Kernels for SQL Aggregates

Three months, two abandoned designs, one breakthrough. The one-paragraph version: Apple Silicon GPUs don't have 64-bit atomic_fetch_add until very recent OS versions, and that single missing instruction shapes every other architectural decision in a Metal SQL aggregate engine.

Jun 6, 2026Read

Data Engineering5 min read

Multi-Aggregate Fusion: One Read, Four Answers

Every analytical engine treats SELECT SUM(x), MIN(x), MAX(x), COUNT(x) FROM t as four passes of the column. Fuse them into a single kernel and the speedup ratio against four-pass code becomes 9x to 25x. Here's why the technique works, and the data shape where it doesn't.

May 30, 2026Read

Data Engineering6 min read

Apple Silicon's Unified Memory Is the Quiet Revolution in Analytical Compute

M3 Ultra ships 512 GB of memory at 819 GB/s, addressable by the GPU with zero PCIe transfer cost. Every GPU database project from the past decade was architected around the assumption that memory bandwidth came at PCIe-tax prices. That assumption is now wrong on a fifth of the developer laptops in the world.

May 23, 2026Read

Stay Curious

Exploring the frontiers of AI, data, and technology. New research and insights published regularly.

About the Author

Research, Build,Story

AI & Machine Learning

Data Engineering

Stories

From The First Mind

The Hackathon

Words Become Numbers

The Map of Meaning

Friends in High Dimensions

The First Spark

The Stack of Minds

Featured

samkhya v1.0: Plug Claude, GPT-4o-mini, or Local Ollama Into Your SQL Query Optimizer