GPU-Accelerated Computing with Python 3 and CUDA

From low-level kernels to real-world applications in scientific computing and machine learning

Created by Niels Cautaerts, Hossein Ghorbanfekr

Explore how to write high-performance Python code that runs on GPUs using CUDA and modern Python libraries. Move from understanding core GPU programming concepts to applying advanced optimization techniques in scientific computing and machine learning. Gain practical skills for real-world data analysis and simulation tasks.

Packt | Mar 2026 | 534 min

Level

Intermediate

What You Will Learn

You will start by learning how to write and execute CUDA kernels in Python, then dive into memory optimization and asynchronous execution. As you progress, you will use high-level libraries to speed up numerical workflows and apply these techniques to real-world problems. Each topic is reinforced with practical examples to help you build confidence and skill.

Key Features

Write and debug custom CUDA kernels in Python to unlock GPU parallelism
Optimize memory access and scale workloads across multiple GPUs for better performance
Accelerate scientific and machine learning workflows using JAX, CuPy, and RAPIDS

Target Audience

This content is designed for Python developers, scientists, engineers, and researchers who already use libraries like NumPy or Pandas and want to speed up their computations using GPUs. If you have a basic understanding of computing and want more control over performance in scientific or machine learning projects, you will benefit from these hands-on techniques.

Related courses