Python is Slow? Not Anymore! How I Made My Code 100x Faster with Cython (And Why You Should Too)

Picture this: you’ve written a Python script to process a massive dataset. You hit ‘Run,’ grab a coffee, and settle in for what you think will be a quick wait. But minutes turn into hours, and your code is still chugging along. Sound familiar? That was me just a few weeks ago. Frustrated and racing against a deadline, I discovered something that changed everything: Cython.

Skeptical? I was too. After all, Python is known for being slow, right? But what if I told you that with a few tweaks, you can make your Python code run as fast as C—without rewriting everything from scratch? In this post, I’ll show you how I transformed my sluggish Python script into a speed demon, and why you might want to ditch pure Python for CPU-heavy tasks.

Why is Python Slow?

Python is one of the most popular programming languages, but when it comes to execution speed, it has some well-known drawbacks:

  • Interpreted Language: Python code runs line by line instead of being compiled into machine code ahead of time.
  • Global Interpreter Lock (GIL): Python’s GIL prevents true multi-threading, limiting CPU-bound performance.
  • Dynamic Typing: While dynamic typing makes Python flexible, it adds runtime overhead for type checking.

Despite these limitations, Python’s ease of use makes it the go-to language for many developers. But what if you could keep Python’s simplicity and get C-like performance? That’s exactly where Cython comes in.

What is Cython?

Cython is a superset of Python that allows you to write Python code that compiles into highly optimized C code. By adding C data types and removing the GIL (Global Interpreter Lock) where possible, you can achieve speeds close to pure C performance.

With Cython, you can:

  • Speed up CPU-bound Python code.
  • Use C data types for faster numerical computations.
  • Remove the GIL to enable true multi-threading and maximize CPU performance.
  • Interface with existing C/C++ libraries easily.

Benchmarking Python vs. Cython Performance

using Google Colab, you may need to install it each session:
!pip install cython

Cython code can be compiled using %%cython magic command in Jupyter/Colab:
%load_ext Cython

Let’s start with a simple example: summing numbers from 0 to n.

Python Version (Slowest)

import time
def python_sum(n):
total = 0
for i in range(n):
total += i
return total
start = time.time()
python_sum(10**7) # 10 million iterations
print("Python Execution Time:", time.time() - start)
import time

def python_sum(n):
    total = 0
    for i in range(n):
        total += i
    return total

start = time.time()
python_sum(10**7)  # 10 million iterations
print("Python Execution Time:", time.time() - start)
import time def python_sum(n): total = 0 for i in range(n): total += i return total start = time.time() python_sum(10**7) # 10 million iterations print("Python Execution Time:", time.time() - start)

Enter fullscreen mode Exit fullscreen mode

Cython Optimized Version
Run this in a separate cell:

%%cython
def cython_sum(int n):
cdef int total = 0
cdef int i
for i in range(n):
total += i
return total
%%cython
def cython_sum(int n):
    cdef int total = 0
    cdef int i
    for i in range(n):
        total += i
    return total
%%cython def cython_sum(int n): cdef int total = 0 cdef int i for i in range(n): total += i return total

Enter fullscreen mode Exit fullscreen mode

start = time.time()
cython_sum(n)
print("Cython Execution Time:", time.time() - start)
start = time.time()
cython_sum(n)
print("Cython Execution Time:", time.time() - start)
start = time.time() cython_sum(n) print("Cython Execution Time:", time.time() - start)

Enter fullscreen mode Exit fullscreen mode

Removing GIL for Faster Execution

The GIL (Global Interpreter Lock) limits Python to single-threaded execution. Removing it in Cython allows truly parallel execution.

%%cython
def cython_sum_nogil(int n):
cdef int total = 0
cdef int i
with nogil:
for i in range(n):
total += i
return total
%%cython
def cython_sum_nogil(int n):
    cdef int total = 0
    cdef int i
    with nogil:
        for i in range(n):
            total += i
    return total
%%cython def cython_sum_nogil(int n): cdef int total = 0 cdef int i with nogil: for i in range(n): total += i return total

Enter fullscreen mode Exit fullscreen mode

start = time.time()
cython_sum_nogil(n)
print("Cython (No GIL) Execution Time:", time.time() - start)
start = time.time()
cython_sum_nogil(n)
print("Cython (No GIL) Execution Time:", time.time() - start)
start = time.time() cython_sum_nogil(n) print("Cython (No GIL) Execution Time:", time.time() - start)

Enter fullscreen mode Exit fullscreen mode

Parallelizing with prange (Fastest!)

For multi-core execution, we use prange from cython.parallel.

%%cython
from cython.parallel import prange
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def cython_sum_parallel(int n):
cdef int total = 0
cdef int i
with nogil:
for i in prange(n, schedule='dynamic', num_threads=4):
total += i
return total
%%cython
from cython.parallel import prange
cimport cython

@cython.boundscheck(False)
@cython.wraparound(False)
def cython_sum_parallel(int n):
    cdef int total = 0
    cdef int i
    with nogil:
        for i in prange(n, schedule='dynamic', num_threads=4):
            total += i
    return total
%%cython from cython.parallel import prange cimport cython @cython.boundscheck(False) @cython.wraparound(False) def cython_sum_parallel(int n): cdef int total = 0 cdef int i with nogil: for i in prange(n, schedule='dynamic', num_threads=4): total += i return total

Enter fullscreen mode Exit fullscreen mode

start = time.time()
cython_sum_parallel(n)
print("Cython (Parallel No GIL) Execution Time:", time.time() - start)
start = time.time()
cython_sum_parallel(n)
print("Cython (Parallel No GIL) Execution Time:", time.time() - start)
start = time.time() cython_sum_parallel(n) print("Cython (Parallel No GIL) Execution Time:", time.time() - start)

Enter fullscreen mode Exit fullscreen mode

Conclusion: When to Use Cython?

Use Cython when performance matters, especially for CPU-heavy loops.
Remove GIL for multi-threading without Python’s limitations.
Use prange when working with multi-core processors.

If you need faster numerical computations, also check out Numba (JIT compilation), but for low-level control, Cython is the best!

原文链接:Python is Slow? Not Anymore! How I Made My Code 100x Faster with Cython (And Why You Should Too)

© 版权声明
THE END
喜欢就支持一下吧
点赞11 分享
Nothing is more terrible than ignorance in action.
最可怕的事莫过于无知而行动
评论 抢沙发

请登录后发表评论

    暂无评论内容