Engineering Lakehouses with Open Table Formats

Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake

Created by Dipankar Mazumdar, Vinoth Govindarajan

Explore the world of open table formats and discover how to build scalable, production-ready lakehouses using Apache Iceberg, Apache Hudi, and Delta Lake. Gain practical experience with leading compute engines and learn to optimize performance and interoperability for real-world data needs.

Packt | Dec 2025 | 416 min

Level

Intermediate

What You Will Learn

You will start by understanding the core concepts behind lakehouse architectures and open table formats. Through practical, hands-on exercises, you will implement and optimize these formats using popular open-source tools. Step-by-step explanations guide you through advanced topics like performance tuning and data interoperability, ensuring you gain both theoretical and practical expertise.

Key Features

Build efficient lakehouses using Apache Spark, Flink, Trino, and Python tools
Apply advanced optimization techniques like pruning, partitioning, and clustering
Integrate and manage data seamlessly across formats with Apache XTable

Target Audience

Ideal for data engineers, software engineers, and data architects with a basic understanding of databases, Python, Apache Spark, Java, and SQL. If you want to deepen your skills in open table formats and transition from traditional data warehouses or lakes to modern lakehouse architectures, you will find clear guidance and actionable strategies here.

Related courses

Pro

Cover image for Databricks Certified Associate Developer for Apache Spark Using Python

Cover image for 50 Hours of Big Data, PySpark, AWS, Scala, and Scraping

Cover image for Apache Spark 3 Advance Skills for Cracking Job Interviews

Cover image for PySpark and AWS: Master Big Data with PySpark and AWS

Cover image for Apache Spark 3 for Data Engineering and Analytics with Python

Cover image for Mastering Big Data Analytics with PySpark