Databricks says it solved the decades-old data pipeline problem that's been slowing AI agents
Summary
<p>For decades, data professionals have struggled with the challenge of managing both operational and analytical databases in a unified approach that doesn't introduce latency and performance degradation.</p><p>Agents made the problem structural. A system that reasons continuously and acts on live data cannot tolerate a pipeline between itself and the information it needs to act on.</p><p>At the Data + AI Summit on Tuesday, Databricks announced two products aimed at collapsing that infrastructure. Lakehouse//RT delivers millisecond query latency directly on governed Delta and Iceberg tables, eliminating the dedicated real-time serving tier that enterprises have maintained alongside their lakehouses. LTAP, short for Lake Transactional/Analytical Processing, stores Postgres-native transactional data in Delta and Iceberg format from the point of write, removing the ETL pipelines that have connected operational and analytical systems for decades.</p><p>Reynold Xin, co-founder of Databricks, described a simpler data stack as "the holy grail for agents" in a briefing with VentureBeat, arguing that as users vibe code more applications, the agents reasoning analytically on top of those apps need the underlying infrastructure out of the way to move fast. </p><p>"The agents really prefer a much simpler stack, because they can move way faster," he said.</p><h2>LTAP bets on storage-layer unification where HTAP tried engine convergence</h2><p>Many vendors have tried various approaches over the decades to unify analytical and transactional data.</p><p>Back in 2014, analyst firm Gartner coined the term HTAP, an acronym that stands for Hybrid Transactional/Analytical Processing as a way to describe vendors that attempted to unify the two types of databases. Vendors including MemSQL (now known as<a href="https://venturebeat.com/data-infrastructure/singlestore-ceo-sees-little-future-for-purpose-built-vector-databases"> SingleStore</a>) SAP HANA and Oracle's<a href="https://venturebeat.com/ai/oracle-mysql-heatwave-lakehouse-goes-ga-to-query-data"> MySQL Heatwave</a> are among many HTAP vendors in the market.</p><p>LTAP is Databricks' answer to HTAP, using the Lakebase architecture to unify data at the storage layer rather than the engine level.<a href="https://venturebeat.com/data/databricks-serverless-database-slashes-app-development-from-months-to-days"> Lakebase</a> is Databricks' serverless cloud-based PostgreSQL database service that became generally available in February.</p><p>"HTAP to us is kind of more of a failure of the industry rather than a success," Xin said. </p><p>The LTAP approach goes to the storage layer instead of the query layer. Lakebase previously stored Postgres data in Postgres format on object storage, requiring conversion before the Lakehouse's analytical engines could use it efficiently. With LTAP, transactional data lands directly in Delta or Iceberg format, sharing the same copy that analytical workloads read. Postgres remains the transactional engine. Spark and the Lakehouse remain the analytical engine.</p><p>"The whole point is, hey, you use the best tool for the job at the query engine level, we just make sure underlying storage is a single copy of the data," Xin said.</p><p>The central engineering challenge is latency. Object storage carries response times in the seconds range, far too slow for OLTP workloads that require sub-millisecond performance. Lakebase handles this through a caching layer between Postgres compute instances and object storage. The key design decision is where the column conversion happens: idle CPU capacity in that caching layer performs the row-to-column conversion before data lands in object storage. </p><p>"When you convert data from row to column, it compresses more than 10 times, typically, so now you substantially reduce the network cost of that basic caching layer between that caching layer and the object stores," Xin said.</p><h2>Lakehouse//RT delivers millisecond query latency on live lakehouse data without a separate serving tier</h2><p>Lakehouse//RT is Databricks' answer to the dedicated real-time serving tier — the separate system enterprises have maintained alongside their lakehouses to handle low-latency queries, at the cost of data copies, split governance and pipeline complexity agents cannot work around. Key capabilities of Lakehouse//RT include:</p><p><b>Reyden compute engine:</b> Built specifically for high-concurrency, low-latency serving, Reyden queries Delta and Iceberg tables directly without moving data out of the lakehouse.</p><p><b>Latency and throughput:</b> Lakehouse//RT delivers sub-100ms latency at 12,000 queries per second, with response times as low as 10ms on smaller datasets and up to 16x better performance than existing dedicated serving stacks.</p><p><b>Governance and data access:</b> Every query runs within Unity Catalog's governance framework with no separate permissions layer, no data copies and no ingestion pipelines.</p><div></div><h2>Analysts see the agentic framing and open format approach as the real differentiators</h2><p>The problem both products address is well-documented among enterprise data teams, but analysts draw a distinction between the pain point and the specific claim Databricks is making.</p><p>"Enterprises have had HTAP, streaming, cloud warehouses, and operational stores for years," Stephanie Walter, Practice Leader for AI Stack at HyperFRAME Research, told VentureBeat. "What is different is the agentic AI framing."</p><p>Walter noted that agents need live operational data, historical context, governance, retrieval, and write-back in the same workflow. </p><p>"That is a strong architecture argument, but Lakebase still has to prove it can meet the latency, reliability, and operational maturity CIOs expect," she said.</p><p>Mike Leone, analyst at Moor Insights and Strategy, said the path to genuine differentiation is more specific than the unification concept itself. He also noted that open analytics on a data lake is table stakes now, with many vendors providing some sort of service.</p><p>"The less common move is letting the transactional writes land in open formats too, so the operational database isn't sitting in a proprietary box while only the analytics half is open, "Leone told VentureBeat. </p><p>He added that the open format approach, paired with Lakehouse//RT querying live data directly off the lake, is what gives the architecture a credible case for retiring a whole row of specialized systems.</p><p>The technical claim that will face the most scrutiny is also the most central one. "The piece I'd still want their engineers to walk through is how both engines truly share one copy without a quiet conversion step doing the syncing in the middle," Leone said.</p><h2>What this means for enterprises</h2><p>For data engineers evaluating their stack for agentic workloads, the question is no longer which best-of-breed tool to run for each job — it's whether running separate tools at all is still defensible.</p><p><b>Enterprises that built separate operational databases, real-time serving tiers and analytical lakehouses could previously treat the gaps between them as a maintenance burden.</b> Agents surface those gaps as an operational risk: a system reasoning across governance boundaries will find the inconsistencies faster than any human team. </p><p><b>The market is moving away from specialized serving layers faster than most vendor roadmaps anticipated. </b>According to <a href="https://venturebeat.com/data/the-retrieval-rebuild-why-hybrid-retrieval-intent-tripled-as-enterprise-rag-programs-hit-the-scale-wall">VB Pulse Q1 2026</a>, a three-wave longitudinal survey of 100-plus employee organizations, hybrid retrieval intent tripled from 10.3% to 33.3% across the quarter while standalone vector database adoption declined across every tracked vendor. The same consolidation logic is now hitting the real-time serving tier. <b>The traditional approach — best-of-breed tools for each workload type, pipelines between them — was built for human-speed analytical consumption.</b> Agent workloads don't tolerate that architecture. </p><p>"The pain they're pointing at, all the copying and syncing between operational and analytical systems, is real and expensive, and anyone running this at scale feels it," Leone said. </p>