Part 8 - Our own WSGI framework

April 01, 2020

(The changes introduced in this post start here.)

In our quest to replace all the parts of a python web application with our own lackluster implementations, we have now come to the grand WSGI finale: a WSGI application. or rather: a WSGI framework which we’ll then build a WSGI application with.

We’ve seen a super simple WSGI application before. It can be as easy as a function that takes the environ dict and start_response callable as inputs, creates some useful response, calls start_response with the response status and headers and finally returns an iterable with the (optional) response body.

def application(environ, start_response):
    status = "200 OK"
    response_headers = [("Content-type", "text/plain")]
    start_response(status, response_headers)
    return [b"Hello World!"]

It’s not too hard to imagine how you could gradually extend that to build up quite complex web applications.

def application(environ, start_response):
	if environ["PATH_INFO"] == "/":
		status = "200 OK"
		headers = [("Content-type", "text/plain")]
		body = b"Hello from /"
	elif environ["PATH_INFO"] == "/create" and environ["REQUEST_METHOD"] == "POST":
		status = "200 OK"
		headers = [("Content-type", "text/plain")]
		data = environ["wsgi.input"].read()
		body = b"Hello from /create called with data " + data
	else:
		status = "404 NOT FOUND"
		headers = [("Content-type", "text/plain")]
		body = b""
    start_response(status, headers)
    return [body]

It’s also not too hard to see how this could gradually get you into trouble. First of all, you need to have intimate knowledge of the WSGI specification, which shouldn’t really be necessary for someone just wanting to build a web application. Also, you’ll want to have an easier way to access the request information and define endpoints. So, sooner or later you’ll define abstractions on top of this basic WSGI application pattern to make your life easier and keep the code maintainable.

The job of a WSGI framework

In order to prevent people from constantly reinventing the wheel, WSGI application frameworks exist. They basically help you to focus on writing the actual business logic of your web application and try to abstract away the more low-level parts of what it means to interact with a WSGI server.

The example app above, in a framework like flask, would look much simpler and more readable:

from flask import Flask, request

app = Flask(__name__)

@app.route("/")
def root():
	return "Hello from /"

@app.route("/create", methods=["POST"])
def create():
	return f"Hello from /create with body data {request.data}"

In that example you can already see some of the jobs the framework is doing: - gives you an easy way to define path operations (as decorated functions) - a path operation is a behavior that should happen when calling a certain path/endpoint on the application - lets you limit the allowed HTTP methods on a certain path operation - gives you much easier access to the request information - automatically converts the path operation function return to a correct HTTP response - uses a default status and headers if not specified otherwise - return value of the path operation function becomes response body - returns a 404 NOT FOUND response when trying to call unavailable path operations

There are also some other things a WSGI framework usually does that can’t be seen in this tiny example, e.g.: - provides other kinds of responses than just plain-text and adjusts the headers accordingly - lets you manually set the response status and headers - lets you register error handlers, such that if exceptions are raised in the application code, this results in a specific HTTP response to be sent back to the client - provides some way to modularize bigger codebases (e.g. via Blueprint in flask) - provides abstractions for more advanced HTTP features like cookies and sessions or authentication flows

In this post we’ll build a tiny flask-like WSGI application framework that ticks off some, but not all, of these boxes.

Some preparations in the codebase

To start with, let’s move all server-related files into a package ./wsgi/server/ and add __init__.py files to both levels of that new module structure. In all server files we now also need to change the local imports to relative imports. We also turn the serve_forever() function in ./wsgi/server/server.py into a WSGIServer class with a serve_forever() method.

#./wsgi/server/server.py
...
class WSGIServer:
    def __init__(self, host: str, port: int, app):
        self.host = host
        self.port = port
        self.app = app

    def serve_forever(self):
        server_socket = socket.socket()
        server_socket.bind((self.host, self.port))
        server_socket.listen(1)

        while True:
            client_socket, address = server_socket.accept()
            print(f"Socket established with {address}.")
            session = Session(client_socket, address, self.app)
            t = threading.Thread(target=session.run)
            t.start()

As you can see, the WSGIServer now accepts an app input, which is how we’ll provide the WSGI application from now on.

We also remove the script-part of ./wsgi/server/server.py that was used to start a test server and instead put that into a top-level ./run.py script. In there we also put what was previously in app.py (the test flask app) to have a combined script that defines a little WSGI application and starts serving it through our WSGI server.

#./run.py
from flask import Flask, request
from wsgi.server import WSGIServer


app = Flask(__name__)


@app.route("/", methods=["GET"])
def root():
    print("Called root endpoint.")
    return "hello from /"


@app.route("/create", methods=["POST"])
def create():
    print(f"Called create endpoint with data {request.data}.")
    return "hello from /create"


if __name__ == "__main__":
    server = WSGIServer("127.0.0.1", 5000, app)
    server.serve_forever()

The package can be installed via pip install -e . through a rudimentary new top-level ./setup.py.

#./setup.py
from setuptools import setup, find_packages


setup(
    name="wsgi",
    description="A tutorial implementation of a WSGI server and application.",
    version="0.0.1",
    packages=find_packages(),
)

After all those changes you can test that you get the previous behavior by running python run.py. The details of this refactor are probably best followed in this commit.

Easily register path operations on the app

Now let’s start with actually creating the application framework. We create a new module ./wsgi/application/, create an __init__.py and a new application.py file. In there will be our framework skeleton:

from typing import Callable
from dataclasses import dataclass


class WSGIApplication:
    def __init__(self):
        self.path_operations = dict()

    def _register_path_operation(
        self, path: str, http_method: str, func: Callable
    ):
        po = PathOperation(path, http_method)
        self.path_operations[po] = func

    def _create_register_decorator(self, path: str, http_method: str):
        def decorator(func: Callable):
            self._register_path_operation(path, http_method, func)
            return func

        return decorator

    def get(self, path: str):
        return self._create_register_decorator(path, "GET")

    def post(self, path: str):
        return self._create_register_decorator(path, "POST")

	# enable the class instance to be used as a callable
    def __call__(self, environ, start_response):
        po = PathOperation(environ["PATH_INFO"], environ["REQUEST_METHOD"])
        func = self.path_operations.get(po)
        if func is None:
            status = "404 NOT FOUND"
            headers = [("Content-type", "text/plain")]
            body = b""
        else:
            status = "200 OK"
            headers = [("Content-type", "text/plain")]
            body = func().encode("utf-8")
        start_response(status, headers)
        return [body]


# frozen so that we can use it as a hash in the path operation dict
@dataclass(frozen=True, eq=True)
class PathOperation:
    path: str
    http_method: str

In contrast to the super simple, function-based WSGI applications we’ve seen so far, this one will actually be implemented as a class. You can see the familiar application interface in the __call__ magic method. You can register new path operation functions on the app using get and post as parametrized decorators (a function that returns a decorator). This is, of course, slightly different to the flask way where a single path operation function could be in charge of responding to all kinds of HTTP methods. But I like it better this way.

On registering a new function, it is simply put into a dict that maps from a unique path+http_method to the actual function to be called. When the application is called, it checks whether the request path+http_method matches any existing path operation function. If not it returns a 404 NOT FOUND response, if yes it calls the path operation function and returns a 200 OK plain-text response with the function return as the body.

Technically we should be returning a 405 METHOD NOT ALLOWED error for a request to an existing path but with the wrong HTTP method. But we want to keep it very simple here, so just be aware of that this is not 100% correct behavior.

We can now actually already replace flask in our test application with our own little mini-framework. Change the top-level ./run.py to the following:

#./run.py
from wsgi.server import WSGIServer
from wsgi.application import WSGIApplication

app = WSGIApplication()


@app.get("/")
def root():
    print("Called root endpoint.")
    return "hello from /"


@app.post("/create")
def create():
    print(f"Called /create endpoint.")
    return "hello from /create"


if __name__ == "__main__":
    server = WSGIServer("127.0.0.1", 5000, app)
    server.serve_forever()

Looks all very familiar. You can test the behavior using curl.

Bundling request data in a single object

One major thing that is not possible yet is to actually get access to the request data (e.g. query parameters, the request body, headers, …) in order to be able to act on it.

Let’s solve this with a new Request class in ./wsgi/application/request.py.

#./wsgi/application/request.py
from typing import Dict
from dataclasses import dataclass


@dataclass
class Request:
    query: Dict[str, str]
    body: bytes
    headers: Dict[str, str]

    @classmethod
    def from_environ(cls, environ: Dict):
        query = {}
        if environ["QUERY_STRING"]:
            qs = environ["QUERY_STRING"]
            query = dict(entry.split("=") for entry in qs.split("&"))
        body = environ["wsgi.input"].read()
        headers = {
            k.replace("HTTP_", ""): v
            for k, v in environ.items()
            if k.startswith("HTTP_")
        }
        return cls(query, body, headers)

Nothing overly fancy here. Just a little dataclass with a factory method that allows it to be instantiated based on an environ dict. It processes the query string, the body and the headers and saves them in the specified format.

An instance of this class is created in the application every time the application is called (i.e. on every new HTTP request to the server). The object is then passed into the path operation function. So we need some minor changes both in the application framework and in the path operation functions we register.

The following needs to be changed in ./wsgi/application/application.py.

#./wsgi/application/application.py
@@ -1,5 +1,6 @@
 from typing import Callable
 from dataclasses import dataclass
+from .request import Request
 
 
 class WSGIApplication:
@@ -33,9 +34,10 @@ class WSGIApplication:
             headers = [("Content-type", "text/plain")]
             body = b""
         else:
+            request = Request.from_environ(environ)
             status = "200 OK"
             headers = [("Content-type", "text/plain")]
-            body = func().encode("utf-8")
+            body = func(request=request).encode("utf-8")
         start_response(status, headers)
         return [body]

And the path operation functions in ./run.py need to change like this:

#./run.py
@@ -1,19 +1,17 @@
 from wsgi.server import WSGIServer
-from wsgi.application import WSGIApplication
+from wsgi.application import WSGIApplication, Request
 
 app = WSGIApplication()
 
 
 @app.get("/")
-def root():
-    print("Called root endpoint.")
-    return "hello from /"
+def root(request: Request):
+    return f"hello from / with query {request.query}"
 
 
 @app.post("/create")
-def create():
-    print(f"Called /create endpoint.")
-    return "hello from /create"
+def create(request: Request):
+    return f"hello from /create with request body {request.body}"

As you can see, the request object is passed right into the path operation functions. This is a reasonable approach for our mini-framework that does the job in a comprehensible way. As a comparison: in flask, access to the request data is provided through a context-local request object. This is a seemingly global object (you import it directly from the flask module), but when you access the object it actually proxies to an instance that is unique to the specific concurrency unit (e.g. a thread) that your application is currently running in. I’ve always found that to be a bit too “magical” for my taste, so let’s stick with our approach.

The request object stores information about the request query parameters, the request body and the request headers. One thing we haven’t implemented is a way to handle path parameters. Those are parameters from paths like e.g. /items/{id} where the {id} could be any number of different values out of some permissible set. Implementing that would also make our path operation matching in the app a bit more complicated, so we’ll skip that.

Implementing different response types

One other thing that would be nice to have is a way to abstract the construction of the responses. Every response has a status and a couple of headers and (optionally) a response body. What’s different between responses is usually the mimetype they define for their body. We want a convenient way to define three different kinds of responses: a plain-text, an HTML and a JSON response.

Let’s do that in a new file ./wsgi/application/response.py.

#./wsgi/application/response.py
import json
from typing import List, Tuple, Optional, Any


class BaseResponse:
    def __init__(
        self,
        status: str = "200 OK",
        headers: Optional[List[Tuple[str, str]]] = None,
        body: Optional[Any] = None,
    ):
        self.status = status
        self.headers = headers if headers is not None else []
        self.body = self.body_conversion(body) if body is not None else b""
        self.add_content_type_and_content_length()

    def add_content_type_and_content_length(self):
        header_names = {name for name, value in self.headers}
        if not "Content-Type" in header_names:
            self.headers.append(("Content-Type", self.content_type))
        if self.body and not "Content-Length" in header_names:
            self.headers.append(("Content-Length", str(len(self.body))))


class PlainTextResponse(BaseResponse):
    content_type = "plain/text"

    @classmethod
    def body_conversion(cls, body):
        return body.encode("utf-8")


class HTMLResponse(BaseResponse):
    content_type = "plain/html"

    @classmethod
    def body_conversion(cls, body):
        return body.encode("utf-8")


class JSONResponse(BaseResponse):
    content_type = "application/json"

    @classmethod
    def body_conversion(cls, body):
        return json.dumps(body).encode("utf-8")

They all derive from a common BaseResponse, with the only differences being the Content-Type header and how they convert their body to a bytes. The add_content_type_and_content_length() method is in charge of automatically setting the Content-Type header and also determining the body length and setting it as the Content-Length header.

Now we can simplify our framework in ./wsgi/application/application.py.

#./wsgi/application/application.py
@@ -1,6 +1,7 @@
 from typing import Callable
 from dataclasses import dataclass
 from .request import Request
+from .response import PlainTextResponse, BaseResponse
 
 
 class WSGIApplication:
@@ -30,16 +31,16 @@ class WSGIApplication:
         po = PathOperation(environ["PATH_INFO"], environ["REQUEST_METHOD"])
         func = self.path_operations.get(po)
         if func is None:
-            status = "404 NOT FOUND"
-            headers = [("Content-type", "text/plain")]
-            body = b""
+            response = PlainTextResponse(status="404 NOT FOUND")
         else:
             request = Request.from_environ(environ)
-            status = "200 OK"
-            headers = [("Content-type", "text/plain")]
-            body = func(request=request).encode("utf-8")
-        start_response(status, headers)
-        return [body]
+            ret = func(request=request)
+            if isinstance(ret, BaseResponse):
+                response = ret
+            else:
+                response = PlainTextResponse(body=ret)
+        start_response(response.status, response.headers)
+        return [response.body]

Defining a correct response is a lot less fiddly and much more expressive now. The default response from a path operation function would be a PlainTextResponse. But we can now use those response types directly in a path operation function if we don’t want the default type. Let’s do that in ./run.py and define a new path operation function that returns a JSONResponse.

#./run.py
from wsgi.server import WSGIServer
from wsgi.application import WSGIApplication, Request
from wsgi.application.response import JSONResponse

app = WSGIApplication()


@app.get("/")
def root(request: Request):
    return f"hello from / with query {request.query}"


@app.post("/create")
def create(request: Request):
    return f"hello from /create with request body {request.body}"


@app.post("/some_json")
def some_json(request: Request):
    return JSONResponse(body={"abc": "def"})


if __name__ == "__main__":
    server = WSGIServer("127.0.0.1", 5000, app)
    server.serve_forever()

You can test this with curl to see that you’re getting the correct payload and headers back.

$ curl localhost:5000/some_json -X POST -i
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 14

{"abc": "def"}

Notes

And that’s as far as we want to go with this. Our little framework is already pretty potent, adding any desired features is left as an exercise to the reader.

This also concludes our treatment of WSGI. We’ve seen how we can implement a WSGI server all the way from a pretty low-level socket-based TCP server. We’ve seen how threads can be used to add concurrency to the server. We’ve looked at the interface between a WSGI server and a WSGI application. And we’ve now even created a tiny framework that allows to quickly build WSGI applications without really having to know what WSGI even is.

Seemingly this is a situation that allows one to build and deploy any imaginable web application. But if you remember the initial post in this series, you’ll recall that there is a successor specification to WSGI called ASGI. We’ll look at the motivations for it in the next post.