Cover image for Preprocessing Unstructured Data for LLMs and RAG Systems

Preprocessing Unstructured Data for LLMs and RAG Systems

Unlock the Power of Unstructured Data for LLMs and Retrieval-Augmented Generation Systems.

Paulo Dichone

Created by Paulo Dichone

Explore the essential steps for preparing unstructured data to work effectively with large language models and retrieval-augmented generation systems. Gain practical experience handling diverse document formats and learn how to extract and organize valuable information for advanced AI applications.

Packt | Sep 2024 | 181 min

Start Trial
LevelExpert
CategoriesLLM Engineering, Natural Language Text Processing and Generation, Python

What You Will Learn

You will work through real-world examples, starting with environment setup and moving on to hands-on preprocessing of various document types. Each section builds your skills step by step, leading up to the creation of a complete retrieval-augmented generation system using the latest techniques.

Key Features

  • Set up a robust environment for preprocessing unstructured data efficiently
  • Extract, clean, and structure content from PDFs, HTML, and PPTX files
  • Build intelligent data pipelines and RAG systems for smarter document interaction

Target Audience

Perfect for data scientists, machine learning engineers, and AI developers who already know Python and have some experience with APIs and machine learning. If you're looking to level up your ability to prepare data for LLMs and RAG systems, you'll find practical, actionable skills throughout.

Related courses