Part 2 - A quick history of python web programming
Let’s look at how python’s integration into webservers has evolved over time. Because the fact that we can now use simple web frameworks, like
FastAPI, to quickly spin up powerful python web applications, has long been far from a given. Many of the things I’ll talk about here are also mentioned in a nice python docs howto on “how to use python in the web”. Some of the information there is a bit dated, but for me it was a great introduction to the topic.
In the mid-late 90s a popular approach was
Common Gateway Interface). It is a protocol that extends the functionality of webservers beyond simply looking up and returning static files. With
CGI it becomes possible to also call scripts (e.g. python, PHP, perl) and then to respond back to the client with the output of those scripts.
This new behavior had to be added to existing webservers (usually through extension modules), but it’s a one-time addition which then unlocks limitless possibilities to interact with scripts.
Roughly, the way it works is that the webserver sets all of the request parameters (
HTTP method, headers, query parameters, …) as environment variables and then executes the script. The script can then read those parameters from the environment variables. The webserver can also write an (optional) request body to the script via stdin. Based on all those inputs, the script now dynamically creates an output. The webserver then reads this output from the stdout of the script and formats it into a proper
HTTP response to send back to the client.
The flow looks roughly like the following.
And a request/response sequence through the system could look like this:
Now you could e.g. have a
/sum, which if called as e.g.
/sum?a=1&b=2 would set the query parameters as environment variables, trigger some script e.g.
sum.py which would read the query parameters from those environment variables, calculate the sum and then print the result to stdout. The webserver could then read the output from the script stdout and respond to the client with a plain-text
3 in the body. At the same time, you could still have your normal static file behavior for other endpoints.
The python script could simply look like this:
#sum.py #!/usr/bin/env python import os from cgi import parse_qs parameters = parse_qs(os.environ.get("QUERY_STRING", "")) print("Content-Type: text/plain;charset=utf-8") print() print(int(parameters["a"]) + int(parameters["b"]))
This collection of scripts triggered by different endpoints would essentially be your “web application” and I’lll use the terms “script” and “web application” interchangeably throughout this post because the boundary is blurry.
The obvious downsides of
- a new process (e.g. python interpreter) would be started everytime a
CGI endpoint is called
- impossible to retain memory between requests
- impossible to pre-load bigger objects into memory beforehand
For all of those reasons
CGI tends to be quite slow and is definitely not a popular choice anymore today.
FastCGI is a standard that tries to address the shortcomings of
CGI. Instead of spawning a new interpreter process for every request, it instead keeps a pool of >=1 processes that it then forwards requests to and receives responses from. This internal communication between the webserver and the web application is done via sockets (read-write connections between computers/processes to exchange arbitrary raw data, we’ll discuss that in more detail later in the series).
FastCGI is available as a module for many popular and mature servers, such as
nginx. It uses a specific protocol to communicate the request/response information between the webserver and the web application over a socket. Therefore the web application needs to be wrapped in a
FastCGI wrapper. This means that both the webserver and the web application need to be adjusted specifically for
FastCGI (as compared to
CGI where the web application part just looked like a plain-old script).
Here is also a great blogpost with some nice diagrams and example code explaining
FastCGI further. If you want to read about the protocol details and how exactly
FastCGI exchanges information over a socket, read here.
The upside is that you now have a persistent pool of interpreters running and can pass requests to them (and get responses) via the socket. This alleviates all of the problems we saw with
CGI. One downside is that you now have to specifically adjust your scripts to be compatible with the
FastCGI protocol. And once you’ve done that, you’re kind-of locked into this way of deploying your web application.
Another approach for solving the problem is
mod_python. It is a module/extension for the popular
Apache webserver written in
C. It embeds a python interpreter right in the
Apache process that handles requests/responses. So instead of having different processes for the webserver and the interpreter and having them communicate e.g. via a socket (as in
FastCGI), here they are combined in the same process.
On an incoming request, the webserver then calls the necessary script using the existing interpreter. Request information is handed into that call and the dynamically created response is returned.
You might wonder how that is even possible. How can a
C program interact with python code, call a function with inputs, get outputs out, etc.?
The answer is python’s C API. Since python itself is implemented in
C, there are ways to call python code from
C. You can e.g. dynamically load a module by name, grab a function in that module by name, call that function with some input parameters and get a return value back. We’re not going to go into more detail here, but this is basically how the interaction goes.
The “protocol” in this case is that you would need to write your script in exactly the way
mod_python expects, such that it finds the correct callable, can call it with specific input vales and get an expected return value out. The
mod_python documentation I linked above is actually quite readable, so I encourage you to check it out, especially this “so what does mod_python do?” part. Here is also another great resource that explains what is actually going while
mod_python handles a request/response cycle and which parts are run in
C and which in python.
Again, this solves the problems of
CGI because you now have a persistent python interpreter embedded right into your webserver process. But it also has the downside (similar to
FastCGI) that you now have to adjust your web application to adhere to the kind of interface that
mod_python expects and can interact with. You couldn’t trivially switch your application deployment between
mod_python. Another downside is that
mod_python is only available for
Apache. You’re out of luck if you e.g. want to put your application behind
So let’s summarize for now. We can have webservers that are
- written in basically any language and invoke python via a gateway protocol (
- written in
C but can embed python and interact with python code directly using the
We could also have a webserver that is written purely in python (we’re actually going to do that later in this series) and can just import and run the web application directly. However, at the time when all of these different approaches were fighting for attention (late 90s, early 2000s), the most potent webservers were written in
C and compatibility with them was important.
The biggest issue with all of these approaches is that they are mutually incompatible. If your web application is written to be deployed with
mod_python, you can’t easily switch to some other webserver and deploy it using
FastCGI. It is kind of a mess. And web development in python around ~2000 was suffering because of it. But luckily some smart people put their heads together and proposed
WSGI in 2003 to bring some order into this space. And that is the topic of the next post.