Part 5 - A simple HTTP server in python
(The changes introduced in this post start here.)
In the previous post we implemented a very simple TCP
server. It was also mentioned that this raw data exchange between a client and the server was not sufficient to facilitate a meaningful transfer of information. A rough analogy to human communication would be that we have now paired up 2 people and they can hear each others’ voices but neither of them actually speaks the other’s language. Also we currently only echo back any incoming message, so it’s rather like pairing up a human and a parrot: not so useful.
Let’s fix both of those issues.
The structure of HTTP
The shared language/protocol between clients and servers on the web is usually HTTP
. It defines a standard format for what requests and responses should look like. In that way, clients can e.g. request resources from the server or send data to it. And the server can reply with some useful response.
The Mozilla Developer Network (MDN) has some great information about the structure of those messages here.
In short, every request/response is made up of 3 distinct parts. 1. a start/status line 2. (optional) headers 3. (optional) body
In the request, the start line contains the HTTP
method, a path/url and the HTTP
version that the client wants to use. The HTTP
method usually looks like a verb and informs the server what kind of operation the client wants to perform (e.g. GET
a resource or POST
some data to the server). The url tells the server which kind of resource the client wants to interact with, this could be something like /index.html
. The HTTP
version is there just so the client and server are exactly on the same page about which version of the language they are speaking.
In the response, the start line again confirms the HTTP
version, followed by a status code (which informs whether the request could be served successfully or, if not, which kind of error happened), followed by a small verbal description of this status code.
The (optional) headers in both request and response are like metadata for the rest of the message. They can be used for a million different things, e.g. to transmit cookies or authentication credentials or to inform the other party about the type of content in the message body. You can read up on them here.
Lastly the (optional) body is the meat of the message. If a client e.g. requests some website from a webserver, the HTML
and CSS
and JavaScript
and images of that website will be transferred in the body of the response (in a separate request/response for each individual resource).
Teaching the server to speak HTTP
Our server can already receive arbitrary raw data from clients. Once we assume that the incoming data are HTTP
requests and we send back useful HTTP
responses we’ll have a webserver.
For now let’s use the excellent httptools
library to parse requests. It provides an object that you can continously feed with the stream of incoming data and it will hand you the relevant data (HTTP
method, url, headers, body, …) nicely chopped up.
Create a new file http_request.py
with the following content.
#./http_request.py
from typing import Callable
class HttpRequestParserProtocol:
def __init__(self, send_response: Callable):
# we hand in and save a callback to be triggered once
# we have received the entire request and can send a response
self.send_response = send_response
# parser callbacks
# gets called once the start line is successfully parsed
def on_url(self, url):
print(f"Received url: {url}")
self.headers = []
# gets called on every header that is read from the request
def on_header(self, name: bytes, value: bytes):
print(f"Received header: ({name}, {value})")
self.headers.append((name, value))
# gets called continously while reading chunks of the body
def on_body(self, body: bytes):
print(f"Received body: {body}")
# gets called once the request was fully received and parsed
def on_message_complete(self):
print("Received request completely.")
self.send_response()
This class needs to be instantiated and handed into the httptools
parser. Then while feeding the parser with incoming request data it will call the callbacks (e.g. on_url
, on_header
, on_body
) and hand you the parsed data.
In another new file http_response.py
let’s add some convenience functions for creating a well-formed HTTP
response.
#./http_response.py
from typing import List, Tuple
import http
def create_status_line(status_code: int = 200):
code = str(status_code).encode()
code_phrase = http.HTTPStatus(status_code).phrase.encode()
return b"HTTP/1.1 " + code + b" " + code_phrase + b"\r\n"
def format_headers(headers: List[Tuple[bytes, bytes]]):
return b"".join([key + b": " + value + b"\r\n" for key, value in headers])
def make_response(
status_code: int = 200,
headers: List[Tuple[bytes, bytes]] = None,
body: bytes = b"",
):
if headers is None:
headers = []
if body:
# if you add a body you must always send a header that informs
# about the number of bytes to expect in the body
headers.append((b"Content-Length", str(len(body)).encode("utf-8")))
content = [
create_status_line(status_code),
format_headers(headers),
b"\r\n" if body else b"",
body,
]
return b"".join(content)
The actual parser lives inside the handle_socket()
function in server.py
. Let’s add those changes.
#./server.py
from typing import Tuple
import socket
import threading
from httptools import HttpRequestParser
from http_request import HttpRequestParserProtocol
from http_response import make_response
def handle_socket(client_socket, address: Tuple[str, int]):
# keep track of whether we have already sent a response
response_sent = False
# nested function for closure on local variables
# that's the callback we'll trigger in the parser
# once we should send the response
def send_response():
# a dummy response for now
body = b"<html><body>Hello World</body></html>"
response = make_response(status_code=200, headers=[], body=body)
client_socket.send(response)
print("Response sent.")
nonlocal response_sent
response_sent = True
# instantiate the protocol and parser
# hand in the send_response callback to get triggered by the parser
protocol = HttpRequestParserProtocol(send_response)
parser = HttpRequestParser(protocol)
while True:
# have we already replied? then close the socket
if response_sent:
break
data = client_socket.recv(1024)
print(f"Received {data}")
# continuously feed incoming request data into the parser
parser.feed_data(data)
client_socket.close()
print(f"Socket with {address} closed.")
def serve_forever(host: str, port: int):
server_socket = socket.socket()
server_socket.bind((host, port))
server_socket.listen(1)
while True:
client_socket, address = server_socket.accept()
print(f"Socket established with {address}.")
t = threading.Thread(
target=handle_socket, args=(client_socket, address)
)
t.start()
if __name__ == "__main__":
serve_forever("127.0.0.1", 5000)
We continuously feed the incoming data into the parser until the entire request is finished. At that point the parser will call our send_response
callback function and we’ll send a dummy response with a dummy HTML
body for now.
The request/response flow is also slightly altered compared to our previous TCP
server. We now wait until a single request has fully arrived, then create and send a response back and immediately close the socket. No more waiting until the client sends us an empty message. One request, one response, over. You can read more about the flow of a HTTP
session between a client and a server here.
Now start the server and test if everything works using curl
.
It should look like this:
$ curl localhost:5000 -i
HTTP/1.1 200 OK
Content-Length: 37
<html><body>Hello World</body></html>
The server will have logged something like the following:
$ python server.py
Socket established with ('127.0.0.1', 42702).
Received b'GET / HTTP/1.1\r\nHost: localhost:5000\r\nUser-Agent: curl/7.69.1\r\nAccept: */*\r\n\r\n'
Received url: b'/'
Received header: (b'Host', b'localhost:5000')
Received header: (b'User-Agent', b'curl/7.69.1')
Received header: (b'Accept', b'*/*')
Received request completely.
Response sent.
Socket with ('127.0.0.1', 42702) closed.
Of course, the server is currently a bit ignorant and will return the same dummy response no matter what the request actually specifies. But it’s still nice to see it in action. And you can even test it in a browser. Go to localhost:5000
in your browser and you should get Hello World
rendered back to you. Browsing a certain website in a browser is nothing other than a HTTP
GET
request where the address you type in becomes the url/path in the start line.
Reformat to a more object-oriented style
One major problem I am seeing with the code as it stands is that handle_socket
in server.py
has become quite convoluted. There is a nested function that relies on closure and a nonlocal
keyword is needed to change a closured variable. The complexity in handle_socket
is unreasonable. Let’s refactor our server into an object-oriented version to alleviate those problems.
#./server.py
from typing import Tuple
import socket
import threading
from httptools import HttpRequestParser
from http_request import HttpRequestParserProtocol
from http_response import make_response
class Session:
def __init__(self, client_socket, address):
self.client_socket = client_socket
self.address = address
self.response_sent = False
protocol = HttpRequestParserProtocol(self.send_response)
self.parser = HttpRequestParser(protocol)
def run(self):
while True:
if self.response_sent:
break
data = self.client_socket.recv(1024)
print(f"Received {data}")
self.parser.feed_data(data)
self.client_socket.close()
print(f"Socket with {self.address} closed.")
def send_response(self):
body = b"<html><body>Hello World</body></html>"
response = make_response(status_code=200, headers=[], body=body)
self.client_socket.send(response)
print("Response sent.")
self.response_sent = True
def serve_forever(host: str, port: int):
server_socket = socket.socket()
server_socket.bind((host, port))
server_socket.listen(1)
while True:
client_socket, address = server_socket.accept()
print(f"Socket established with {address}.")
session = Session(client_socket, address)
t = threading.Thread(target=session.run)
t.start()
if __name__ == "__main__":
serve_forever("127.0.0.1", 5000)
The entire communication on a particular client_socket
is now handled in a Session
object. To my eye this is a lot cleaner.
Notes
Whenever I’m talking about HTTP
here, I’m talking about HTTP/1.1
. There are different versions and e.g. the latest HTTP/2
has a somewhat different message structure. But if you understand HTTP/1.1
and the different parts of the message structure, this can easily be translated to HTTP/2
, so you need not worry.
For educational purposes We’ve worked our way up from low-level socket
components to a threaded server that can communicate using HTTP
. The python standard library also provides a ready-made HTTP
server at http.server
if you’d like to check that out.
We now have a fully functioning, albeit somewhat brainless, webserver. Simply being more reactive to the actual content of the requests would enable us to add all kinds of desired behaviors. In the next post we’ll look at replacing httptools
with a very simple, custom HTTP
parser.