
GPU-Accelerated Computing with Python 3 and CUDA
From low-level kernels to real-world applications in scientific computing and machine learning
Created by Niels Cautaerts, Hossein Ghorbanfekr
Explore how to write high-performance Python code that runs on GPUs using CUDA and modern Python libraries. Move from understanding core GPU programming concepts to applying advanced optimization techniques in scientific computing and machine learning. Gain practical skills for real-world data analysis and simulation tasks.
Packt | Mar 2026 | 534 min
What You Will Learn
You will start by learning how to write and execute CUDA kernels in Python, then dive into memory optimization and asynchronous execution. As you progress, you will use high-level libraries to speed up numerical workflows and apply these techniques to real-world problems. Each topic is reinforced with practical examples to help you build confidence and skill.
Key Features
- Write and debug custom CUDA kernels in Python to unlock GPU parallelism
- Optimize memory access and scale workloads across multiple GPUs for better performance
- Accelerate scientific and machine learning workflows using JAX, CuPy, and RAPIDS
Target Audience
This content is designed for Python developers, scientists, engineers, and researchers who already use libraries like NumPy or Pandas and want to speed up their computations using GPUs. If you have a basic understanding of computing and want more control over performance in scientific or machine learning projects, you will benefit from these hands-on techniques.





