Let’s be honest: Python is slow. When I say Python, I mean CPython, it’s reference, C-based, implementation. This is where PyPy comes into play. This is a Python runtime, written in Python (!), which performs 4.4 times faster than CPython. How? Read ahead.
It is as simple as that. Competitive programmers were among the first to harness the power of PyPy: sometimes solutions written in Python will take too long and fail, by the same code running in PyPy will pass with flying colours. But how?
- Interpreters are incredibly easy to write
- Powerful metaprogramming capabilities
- No compile-time fails
Of course, there are a couple of downsides:
- Severe performance overhead of parsing source code at runtime
- No compile-time fails
You can see I included no compile-time fails in both advantages and disadvantages. There may be times when you need different behaviour (e.g. when you are prototyping vs production), but I am still inclined towards treating it as a disadvantage.
PyPy does things a bit differently. This is not a pure interpreter, but rather it implements a tracing Just-in-time (JIT) compilation.
Just-in-time compilation is the middle ground between interpretation and conventional ahead-of-time compilation. Just-in-time compilers do not execute the source code itself, but instead generate a set of lower-level instructions (assembly, most often) that are executed almost immediately.
This illustration should help you understand the difference. In compiled languages (C, C++, Rust) the compilation phase is partitioned strictly to the development environment. It produces a runnable binary, which is then sent to production. In interpreted, quite the opposite is true: source code (after *fication, hello JS) is pushed in its entirety to production, where an interpreter will execute it. JIT languages also ship source code (or bytecode, like Java or C#), but it is compiled and ran as regular compiled language rather then interpreted line-by-line.
It is not that one approach is better than the other. Every use-case will dictate the correct choice by its unique needs. But if performance is critical and you feel at home in Python interpreter, PyPy will be your choice.
Tracing Just-in-time compilation
Just like compilation or interpretation, there are different ways of implementing just-in-time compilation. The conventional one is method/function scoped: when your code calls a function, the JIT compiler will get its source code, compile, and serve the executable binary. PyPy takes a slightly different approach, as dictated by Python’s unique features and use-cases.
Instead of per-method-call, PyPy’s compiler evaluates loops. Since Python is used heavily for data science, machine learning and extensive use of advanced algorithms and data structures, this made most sense. In short, PyPy is an optimization layer on top of Python.
PyPy does not deal strictly with loops as you perceive them. Apart from regular
while constructions, PyPy will optimize arbitrary blocks of code, if it detects that the compilation effort will be worth it.
Of course, PyPy is yet another tool that will have its downsides as well. Even though you get a tremendous performance increase, do keep in mind:
- Not all Python is supported. Most of your code is, though. But if you deal with low-level CPython implementation details, or have some Cython binding, this would not work.
- Back to future. PyPy current version is 3.4, while Python is stable at 3.8 right now. But backtracking is something we Pythonists are very good at (shoutout to Python v2 devs right now)
- Optimizations are good, but this is no exuse to write bad code. If your code is unreadable by humans, how do you expect PyPy to make sence of it?
- GIL is still there. If you do some heavy multi-threading, read on to other implementations.
As with any tool, you should consider all ins and outs before adopting it. But the next time you log in to CodeForces for a challenge, try PyPy out. There is a chance your
O(n^3) monstrosity will pass, where only a
O(n log n) would pass in pure Python.
While source code of CPython and PyPy are out of scope of this more general article, for insitive readers I found these files that implement the factorial function in CPython (C code) and PyPy (Python code).
Other than CPython and PyPy there are other notable Python implementations:
- Stackless Python. This is the same Python, but without GIL. It’s most notable use is as a backend for the EVE Online game.
- IronPython is a .NET implementation of Python, which gives very easy interop for your Python and C# code.
- JPython is the same thing, but with Java
Thank you for reading, I hope you liked my article. Let me know about your experiences with PyPy in the comments!