Cover image for Engineering Lakehouses with Open Table Formats

Engineering Lakehouses with Open Table Formats

Build scalable and efficient lakehouses with Apache Iceberg, Apache Hudi, and Delta Lake

Dipankar MazumdarVinoth Govindarajan

Created by Dipankar Mazumdar, Vinoth Govindarajan

Explore the world of open table formats and discover how to build scalable, production-ready lakehouses using Apache Iceberg, Apache Hudi, and Delta Lake. Gain practical experience with leading compute engines and learn to optimize performance and interoperability for real-world data needs.

Packt | Dec 2025 | 416 min

Start Trial
LevelIntermediate
CategoriesData Engineering, Data Warehousing and Big Data Processing Frameworks, Spark

What You Will Learn

You will start by understanding the core concepts behind lakehouse architectures and open table formats. Through practical, hands-on exercises, you will implement and optimize these formats using popular open-source tools. Step-by-step explanations guide you through advanced topics like performance tuning and data interoperability, ensuring you gain both theoretical and practical expertise.

Key Features

  • Build efficient lakehouses using Apache Spark, Flink, Trino, and Python tools
  • Apply advanced optimization techniques like pruning, partitioning, and clustering
  • Integrate and manage data seamlessly across formats with Apache XTable

Target Audience

Ideal for data engineers, software engineers, and data architects with a basic understanding of databases, Python, Apache Spark, Java, and SQL. If you want to deepen your skills in open table formats and transition from traditional data warehouses or lakes to modern lakehouse architectures, you will find clear guidance and actionable strategies here.

Related courses