Skip to content

Apache Iceberg vs. Delta Lake vs. Apache Hudi: A Technical Deep Dive

In the modern data lakehouse architecture, the storage layer has evolved from a simple repository of raw files into a transactional database environment. This transformation is made possible by open table formats: software layers that sit between raw data files (such as Apache Parquet or Apache ORC) and compute engines (such as Dremio, Apache Spark, and Trino).

Three formats have emerged as the industry standards for managing mutable analytical tables on object storage: Apache Iceberg, Delta Lake, and Apache Hudi. While all three solve the same core problem (providing ACID transactions, consistent reads, schema enforcement, and time travel over cheap storage), they were designed with different architectural priorities. Consequently, each format is optimized for different workloads, engineering ecosystems, and query patterns.

This guide provides an engine-neutral comparison of the three table formats. We examine their historical origins, compare their metadata architectures, analyze performance benchmarks across different engines, evaluate their row-level deletion strategies, and establish a decision framework to help data architects select the optimal format for their analytical workloads.

Origins and Governance Models

The design priorities of each format are deeply rooted in their historical origins. Understanding where and why these formats were built explains their core architectural trade-offs.

Apache Iceberg: Open Standards and Multi-Engine Interoperability

Apache Iceberg was originally developed at Netflix in 2017 by Ryan Blue and Dan Weeks. At the time, Netflix managed massive datasets on AWS S3 and relied on Apache Hive to structure tables. As data volumes expanded, Netflix engineers encountered major operational bottlenecks with Hive. These included directory listing latency on S3, atomic commit failures during concurrent writes, and query planning overhead.

Iceberg was designed from the ground up to solve these problems by shifting table state tracking from directory locations to a tree of metadata files. Netflix's primary goal was to ensure that multiple compute engines (such as Spark, Trino, and Flink) could read and write to the same tables concurrently without lock-in. Netflix donated Iceberg to the Apache Software Foundation in 2018, and it graduated to a top-level project in 2020.

Delta Lake: Spark-First Optimizations and Commercial Integration

Delta Lake was created by Databricks in 2019. Databricks built Delta Lake to address reliability issues in Apache Spark workloads running on cloud object storage. Before Delta Lake, engineers writing Spark pipelines to S3 or ADLS struggled with partial write failures, which left data lakes in a corrupted state, and lacked transaction isolation.

Delta Lake solved these issues by implementing a transaction log directory (named _delta_log) alongside the data files. This log acts as a single source of truth, tracking transactions sequentially. Initially, Delta Lake was tightly coupled with Apache Spark, and several advanced features were proprietary to the Databricks platform. Databricks open-sourced the format in 2019 under the Linux Foundation, and subsequently released Delta Lake 3.0 in 2023 to bring greater parity between the open-source library and its proprietary features.

Apache Hudi: High-Frequency Streaming and Incremental CDC

Apache Hudi (Hadoop Upsert Delta and Incremental) was developed at Uber in 2016 by Vinoth Chandar. Uber needed to ingest massive volumes of ride-sharing and passenger data in real-time, executing high-frequency updates and deletes (slowly changing dimensions and change data capture feeds) over Hadoop HDFS.

Uber designed Hudi specifically to optimize write performance for frequent key-based updates and to enable incremental query processing. Hudi was donated to the Apache Software Foundation in 2019 and became a top-level project in 2020. Hudi is unique in its focus on streaming ingestion, utilizing index structures (such as Bloom filters or HBase index tables) to perform fast upsert lookups during write operations.

Metadata Architecture: How State is Tracked

The core differentiator between the three formats is how they track which physical data files make up the current state of a logical table. This structural tracking determines how query planning occurs, how concurrent writes are managed, and how files are pruned during execution.

graph TD subgraph ICE["Apache Iceberg (Hierarchical Tree)"] I1["metadata.json (current pointer)"] I2["Manifest List (snapshot state)"] I3["Manifest Files (per-file stats)"] I4["Parquet Data Files"] I1 --> I2 --> I3 --> I4 end subgraph DL["Delta Lake (Sequential Log)"] D1["_delta_log/ (JSON commit files)"] D2["Parquet Checkpoint Files"] D3["Parquet Data Files"] D1 --> D2 D1 --> D3 end subgraph HUDI["Apache Hudi (Timeline and Indexes)"] H1[".hoodie/ timeline (commit metadata)"] H2["Key Index (Bloom / Bucket / Metadata Table)"] H3["Base Files (Parquet)"] H4["Log Files (Avro delta logs)"] H1 --> H2 H2 --> H3 H2 --> H4 end

1. Apache Iceberg: Hierarchical Metadata Tree

Iceberg tracks table state using a three-tiered metadata tree stored directly alongside data files in object storage. This hierarchical layout enables engines to execute query planning without performing costly directory listings.

This hierarchical structure means query planning is entirely O(1) metadata reads. Compute engines query the catalog to find the metadata JSON path, read the manifest list, prune manifests based on query filters, and then scan only the relevant manifests. No directory listing is required, which eliminates cloud object storage listing penalties.

2. Delta Lake: Sequential Transaction Log

Delta Lake tracks state using a directory named _delta_log/ located at the root of the table. Rather than using a hierarchical tree, Delta Lake relies on a sequential log of transaction files.

To read a Delta Lake table, the query engine reads the _last_checkpoint file in the log directory to find the latest checkpoint version. It reads that checkpoint Parquet file directly, then lists the _delta_log/ directory to locate any subsequent JSON commit files written after the checkpoint. The engine replays these newer JSON commits in memory to compile the final list of active data files. Because directory listing is required to discover the latest JSON files after a checkpoint, Delta Lake query planning performance can degrade if log directories accumulate too many uncompacted commits, requiring regular log cleanups.

3. Apache Hudi: Timeline and Index Layers

Apache Hudi uses a metadata directory named .hoodie/ to maintain a transactional timeline of commits. Hudi is designed to optimize write performance for frequent key-based updates, relying on indexes to locate files rather than parsing metadata trees.

When writing data, Hudi uses this index layer to check if an incoming record already exists in the table. If it does, the writer updates the existing file (or appends a delta log); if not, it inserts it as a new file. This index-centric architecture makes Hudi highly efficient for updates but introduces write overhead.

Aggregated Performance Benchmarks

Performance comparison across table formats is not static. It depends heavily on the query engine utilized, library versions, clustering layouts, and query workloads. We have aggregated real-world benchmark data comparing these formats.

Workload Categories and Results

The following evaluations are based on enterprise testing using Spark 3.5, Dremio 25.x, and Trino 450 query engines, with table libraries set to Iceberg 1.6, Delta 3.2, and Hudi 0.15. Workloads are categorized into scan throughput, concurrent writes, and point lookups.

Workload Type Apache Iceberg Delta Lake Apache Hudi Workload Summary and Performance Drivers
Scan Throughput (Read-Heavy) Excellent (Fast Pruning) Excellent (Fast Pruning) Moderate (Index Overhead) Iceberg and Delta prune files efficiently using column stats. Hudi's index checks add scan overhead.
Concurrent Writes (Optimistic Lock) High (Conflict Retries) High (Conflict Retries) Moderate (Queue Block) Iceberg and Delta resolve concurrent appends via retry loops. Hudi uses lock providers to serialize commits.
Point Lookups (Key Searches) Moderate (Full Scan) Moderate (Full Scan) Excellent (Index Lookup) Hudi locates record keys directly using Bloom filters or HBase indexes, bypassing full table scans.
CDC Ingestion (Upsert/Delete) High (Merge-on-Read) High (Deletion Vectors) Excellent (Timeline Compaction) Hudi optimizes streaming CDC using log merges. Delta and Iceberg rely on positional delete file joins.

Optimistic Concurrency Control vs. Pessimistic Lock Providers

Data write conflicts are managed differently across the formats. Apache Iceberg and Delta Lake rely primarily on Optimistic Concurrency Control (OCC). Under OCC, writers assume that conflicts are rare. When a transaction starts, the writer reads the table's current snapshot and prepares its changes (writing new data or delete files) in isolation. When the writer attempts to commit, it checks if another writer has committed a new snapshot since the transaction began. If no conflict is found, the commit succeeds. If a conflict occurs (for example, if another writer modified the same files or partitions), the transaction fails and the writer must retry. Iceberg handles this by reading the updated metadata, checking if the changes overlap, and applying a retry loop up to a configured threshold. This model works exceptionally well for appends and disjoint updates, but experiences high commit failure rates during heavy concurrent updates to the same partitions.

In contrast, Apache Hudi supports both OCC and multi-writer concurrency control via explicit lock providers (such as ZooKeeper, AWS DynamoDB, or Hive Metastore locks). When multiple writers attempt updates, Hudi uses these lock providers to serialize write operations. Hudi's lock-based approach prevents concurrent commit retries by forcing writers to acquire a lock before final commit, reducing compute waste on retries during high-concurrency workloads but adding dependency management overhead.

Engine Selection Caveats: Dremio vs. Spark vs. Trino

The choice of compute engine exerts a larger influence on performance than the choice of table format itself. An unoptimized engine configuration can nullify the benefits of a format's metadata layout.

Physical Layout Optimization: Iceberg Z-Order vs. Delta Liquid Clustering

To achieve sub-second read performance, data files must be organized to group related values together. This maximizes the efficiency of file pruning.

Apache Iceberg uses Z-Ordering (a multi-dimensional space-filling curve) to cluster data files. When compacting a table, Iceberg reorganizes rows across multiple columns (such as customer_id and order_date) to ensure that the min/max ranges for these columns are highly localized. This allows query engines to skip scanning files during execution. However, running Z-order compaction is a compute-intensive batch operation that must be scheduled periodically.

Delta Lake 3.0+ introduces Liquid Clustering as an alternative to Z-ordering. Liquid clustering is a dynamic, incremental clustering strategy. Instead of requiring developers to select fixed partition columns or run massive compaction jobs, Liquid clustering partitions and clusters data dynamically as writes occur. It adapts to changing query patterns without requiring table schema rewrites, reducing write amplification compared to Z-order compactions.

Row-Level Mutations: Copy-on-Write vs. Merge-on-Read

Analytical data lakes are primarily append-only. However, regulatory requirements (such as GDPR delete requests) and CDC pipelines require row-level updates and deletes. The formats implement distinct write-and-read strategies to handle these mutations.

1. Copy-on-Write (CoW)

Copy-on-Write is supported by all three formats. In CoW mode, any update or delete operation requires the compute engine to rewrite the physical data files containing the targeted records.

If a table contains a 100 MB Parquet file with 1,000,000 rows, and a query deletes a single row, the engine reads the entire 100 MB file, filters out the deleted row, and writes a new 100 MB Parquet file. The metadata is updated to reference the new file and ignore the old one.

2. Merge-on-Read (MoR)

Merge-on-Read optimizes write performance by deferring data rewrites. When an update or delete occurs, the engine writes a separate, smaller file recording the mutation and commits it to metadata.

When a query engine reads a MoR table, it must read the base data files and join them with the delete files to filter out modified rows on the fly.

The formats implement MoR differently:

Runnable SQL Examples and Configurations

To illustrate the difference between these mutation strategies, let us review concrete SQL configurations and operations. We use the standard analytics.orders and analytics.customers schemas to demonstrate write modes.

Merge-on-Read Configuration and MERGE INTO (analytics.orders)

Below is the Spark SQL configuration to create the analytics.orders table with Merge-on-Read enabled for updates, deletes, and merges. This configuration optimizes write performance by writing deletes as separate position-delete files rather than rewriting Parquet files.

/* Create analytics.orders table using Apache Iceberg with Merge-on-Read write mode */
CREATE TABLE local.analytics.orders (
    order_id BIGINT,
    customer_id BIGINT,
    order_date DATE,
    amount DECIMAL(10, 2),
    status STRING
) USING iceberg
TBLPROPERTIES (
    'write.update.mode' = 'merge-on-read',
    'write.delete.mode' = 'merge-on-read',
    'write.merge.mode' = 'merge-on-read'
);

/* Execute an upsert using MERGE INTO from a source delta updates table */
MERGE INTO local.analytics.orders AS target
USING (
    SELECT order_id, customer_id, order_date, amount, status 
    FROM local.analytics.orders_updates
) AS source
ON target.order_id = source.order_id
WHEN MATCHED THEN
    UPDATE SET 
        target.amount = source.amount,
        target.status = source.status,
        target.order_date = source.order_date
WHEN NOT MATCHED THEN
    INSERT (order_id, customer_id, order_date, amount, status)
    VALUES (source.order_id, source.customer_id, source.order_date, source.amount, source.status);

Copy-on-Write Configuration and UPDATE (analytics.customers)

For tables where query performance is critical and write frequency is low, Copy-on-Write is preferred. Below is the configuration for the analytics.customers table, followed by a row-level update.

/* Create analytics.customers table using Apache Iceberg with Copy-on-Write write mode */
CREATE TABLE local.analytics.customers (
    customer_id BIGINT,
    name STRING,
    email STRING,
    country STRING
) USING iceberg
TBLPROPERTIES (
    'write.update.mode' = 'copy-on-write',
    'write.delete.mode' = 'copy-on-write'
);

/* Execute an update that rewrites only the Parquet files containing the matching records */
UPDATE local.analytics.customers
SET email = 'updated_customer@example.com'
WHERE customer_id = 1045;

Ecosystem, Catalogs, and Governance

A table format does not operate in a vacuum. It requires a catalog to track table locations and enforce access controls. The catalog architecture is critical for multi-engine interoperability and preventing vendor lock-in.

1. Iceberg Catalog Model: Decentralized and Open

Iceberg defines an open REST Catalog API specification. Any service that implements this specification can function as an Iceberg catalog. Compute engines make HTTP requests to the REST service to load schemas and request atomic commits.

This open model has led to multiple implementations:

2. Delta Lake Catalog Model: Unity Catalog

Historically, Delta Lake relied on the Hive Metastore to track directories. To provide advanced catalog capabilities, Databricks introduced Unity Catalog, a unified governance and access control layer. Unity Catalog coordinates transactions, manages schemas, and enforces access control rules. While Databricks open-sourced Unity Catalog in 2024 to address lock-in concerns, its deployment and execution remain heavily optimized for the Databricks cloud platform. Operating Delta Lake tables outside the Databricks ecosystem requires deploying Unity Catalog or relying on translation layers, which increases operational complexity compared to Iceberg's open REST API.

3. Hudi Catalog Model: Engine Metadata Table

Hudi does not require a separate transactional catalog layer to coordinate pointer swaps. Instead, the Hudi library itself tracks transactions directly in the table's .hoodie/ directory, writing metadata updates alongside commits. While Hudi can register tables with the Hive Metastore or AWS Glue for discovery, the source of truth remains the table timeline itself. This design simplifies write operations but makes multi-engine concurrency validation more complex.

Decision Framework: How to Choose

To assist data architects and database engineers in selecting the correct open table format, we have established a workload-driven decision matrix.

Your Core Requirement Recommended Format Architectural Rationale
Multi-Engine Portability & Governance Apache Iceberg The open REST Catalog API allows Dremio, Spark, and Trino to read/write concurrently with central RBAC via Polaris.
Databricks Ecosystem Integration Delta Lake If your organization runs primarily on Databricks, Delta Lake offers native platform speed and Unity integration.
High-Frequency Key Updates & CDC Apache Hudi Hudi's key indexes (Bloom/Bucket) and background log compaction minimize update latencies for write-heavy CDC.
Sub-Second BI Queries Apache Iceberg (with Dremio) Dremio executes vectorized Arrow queries, caches Iceberg metadata locally, and accelerates scans using reflections.
Version Control (Branch/Merge) Apache Iceberg (with Nessie) Nessie enables Git-like branching for Write-Audit-Publish patterns across multiple tables simultaneously.

Conclusion and Next Steps

Apache Iceberg, Delta Lake, and Apache Hudi are all mature, enterprise-grade formats. However, their architectural differences are significant. Apache Iceberg represents the standard for engine-neutral lakehouse architectures, offering a clean hierarchical metadata tree and an open catalog API. Delta Lake remains the preferred choice for Spark-heavy, Databricks-centric environments. Apache Hudi provides optimized key indexing and timeline compactions for streaming CDC pipelines.

Data engineers should evaluate their compute ecosystems, update frequency requirements, and catalog governance strategies before selecting a format. In many modern environments, Iceberg has become the default choice due to its open standards, lack of vendor lock-in, and integration with high-performance query acceleration engines like Dremio.

Go Deeper

๐Ÿ“š Go Deeper on Apache Iceberg

Alex Merced has authored three hands-on books covering Apache Iceberg, the Agentic Lakehouse, and modern data architecture. Pick up a copy to master the full ecosystem.