
Engineering Lakehouses with Open Table Formats
Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake
Created by Dipankar Mazumdar, Vinoth Govindarajan
Explore the world of open table formats and discover how to build scalable, production-ready lakehouses using Apache Iceberg, Apache Hudi, and Delta Lake. Gain practical experience with leading compute engines and learn to optimize performance and interoperability for real-world data needs.
Packt | Dec 2025 | 416 min
What You Will Learn
You will start by understanding the core concepts behind lakehouse architectures and open table formats. Through practical, hands-on exercises, you will implement and optimize these formats using popular open-source tools. Step-by-step explanations guide you through advanced topics like performance tuning and data interoperability, ensuring you gain both theoretical and practical expertise.
Key Features
- Build efficient lakehouses using Apache Spark, Flink, Trino, and Python tools
- Apply advanced optimization techniques like pruning, partitioning, and clustering
- Integrate and manage data seamlessly across formats with Apache XTable
Target Audience
Ideal for data engineers, software engineers, and data architects with a basic understanding of databases, Python, Apache Spark, Java, and SQL. If you want to deepen your skills in open table formats and transition from traditional data warehouses or lakes to modern lakehouse architectures, you will find clear guidance and actionable strategies here.





