Here at Kogan we do a hack day every 2 months. The organisers will come up with a theme for the day, we'll split off into teams, and get to building.

For our March hackday this year we decided to build a multiplayer game using Django Channels. We've been keeping an eye on channels over the time, and thought with the release of channels 2.0 that it was the right time to dive in and get some practical experience. The organisers didn't want to do yet-another-chat-system implementation, so they decided to make things a bit more interesting, and look at writing a real-time game.

We split up into two teams. The first team decided to build a top down zombie shooter based on pygame using a tiled map with the design of our office. The second team decided to build a multiplayer version of the popular mobile game Snake#Nokia_phones). This post is about the second project, which we affectionately named Snek.

Note: We know that Snake isn't exactly a real-time game, but we wanted to focus more on how channels worked, rather than the mechanics of the game itself.

This will be a fairly technical analysis of the project, the choices we made, and about how channels works in a semi-production like manner. We made the decision to open-source the project in the interests of providing a real life example to the community on how the various pieces fit together.

Introducing https://github.com/kogan/kogame-snek/. I've also just published the game to heroku at https://kogame-snek.herokuapp.com/ so you can actually load it up and play.

High Level Architecture

Tile based games like Snake aren't very complex conceptually. There's a 2-dimensional array to represent the game board. Within each element of that array is some kind of object, like a snake segment, a piece of food, or a wall. It can also be empty, representing a tile that can be moved into.

Each snake can be represented as a list of coordinates within the grid. Food can be represented as a single coordinate within the grid.

At any given moment, the entire game can be represented in roughly the following way:

{
    "dimensions": [24, 24],
    "players": [
        {
            "name": "username-1", 
            "snake": [(0, 0), (0, 1), (0, 2)], 
            "direction": "RIGHT"
        },
        ...
    ],
    "food": [(14, 9), (20, 18)],
    "tick": 143
}

After each "tick" (a tick represents the next game state) the game recalculates the next position of each snake, if any of the snakes collided and are now out of the game, if a piece of food was eaten, etc. It will then render the next game state, and deliver that to each game participant.

We use React.js a lot at Kogan, so we figured we'd represent the game state as a grid of divs on the page. setState could then be used to update the board each time it received a new state from the server. For a snake to change direction, the client would send the server a new direction, which would be used during the next game tick. In a real game, you likely would not use a grid of divs that was updated roughly 5 times per second. It has very bad performance, resulting in many missed "frames". Again, we were mostly focussed on learning how channels worked, so this was an acceptable solution for now. In the future, we'll look into using Canvas.

Game Board Screenshot

Given we had a good idea of the game data format, and how we'd represent that on the frontend, we then had to figure out a few things.

How do we update the client in an efficient manner?
How can the client update the server to change direction?
How do we run the game engine on the server so that it is shared between multiple clients?

These are exactly the kind of things that django channels allows us to do.

Enter Channels

The primary purpose of channels is to enable WebSockets in your Django project.

WebSockets are an advanced technology that makes it possible to open an interactive communication session between the user's browser and a server. With this API, you can send messages to a server and receive event-driven responses without having to poll the server for a reply.

HTTP is based around the request/response model. The client sends a request to the server, and the server responds with the data. There is no facility in HTTP (HTTP 1.x) for the server to send data to the client when the server is ready.

A WebSocket is a persistent connection that a client can make to a server. Since the connection is always open, the server can send data to the client whenever it likes. That data might be new HTML, or a JSON structure representing the state of a game. It is then up to the client to interpret each message and react (hehe) accordingly. The client may also send messages to the server, which the server would also need to interpret and process, depending on the data being sent.

Channels provides an application server that can maintain these persistent websocket connections, but it can also handle the traditional HTTP requests and responses. It's called daphne and replaces the function of gunicorn or uwsgi in your deployment.

Note: You still can use gunicorn or uwsgi for your regular HTTP traffic if you like.

Another thing that channels does is provide the concept of groups, which allow multiple websocket clients to communicate with each other over a lightweight pub-sub channel (hence the name). There are a few different layers that channels can use, but the most mature and easiest to deploy layer is probably channels_redis.

Syncing game state

Each client in a channels deployment is represented as an instance of a Consumer. The actual instance persists for as long as the client remains connected. For a websocket, a client will connect(), the server will accept(), and then the server can either .send() data to the client, or receive() data from the client.

You can follow along in the next section with the PlayerConsumer class.

Given what we know about channels, let's go back to our 3 issues and see how we can solve them.

1. How do we update the client in an efficient manner?

All clients connect, via websockets, to a channels group. When the game has a new state ready, it publishes the state to the group, which every client receives. The client can then use that new state to update its UI.

async def connect(self):
    self.group_name = "snek_game"
    # Join a common group with all other players
    await self.channel_layer.group_add(self.group_name, self.channel_name)
    await self.accept()

# Send game data to group after a Tick is processed
async def game_update(self, event):
    # Send message to WebSocket
    state = event["state"]
    await self.send(json.dumps(state))

2. How can the client update the server to change direction?

All the clients are already connected via websockets, so the client publishes the new direction over the websocket. The below code shows how the server must decode and interpret the message it's receiving, as the client can either join a game or publish a new direction.

# Receive message from Websocket
async def receive(self, text_data=None, bytes_data=None):
    content = json.loads(text_data)
    msg_type = content["type"]
    msg = content["msg"]
    if msg_type == "direction":
        return await self.direction(msg)
    elif msg_type == "join":
        return await self.join(msg)

3. How do we run the game engine on the server so that it is shared between multiple clients?

Hmm, so this question isn't easy to answer given the information we already know. Let's dive a bit deeper.

Running the game engine

Game engines like this typically run in one big infinite loop. In fact, this is exactly how we chose to run our game loop:

def run(self) -> None:
    while True:
        self.state = self.tick()
        self.broadcast_state(self.state)
        time.sleep(self.tick_rate)

The game calculates the next state with tick(), it then publishes or broadcasts that state in some way, and then goes to sleep for some small amount of time. It'll then wake up and do it all again.

Now, if you forget about wanting this to interact with a webserver, you'd probably create a thread or process, and run it indefinitely. Indeed, that's what we do:

class GameEngine(threading.Thread):
    def run(self) -> None:
        ...

But what starts this thread? And how do we interact with that thread via our websocket clients?

The answer is, we create a new consumer! Consumers can join groups and interact with each other over the channel layer (redis). But aren't consumers websocket clients? Nope. Consumers are classes that communicate with each other over a channel layer. There are many different kinds of consumers. We used an AsyncWebsocketConsumer for the client, but we can use a SyncConsumer that's running our infinite loop game engine, and have the two communicate over the channel layer!

class GameConsumer(SyncConsumer):
    def __init__(self, *args, **kwargs):
        """
        Created on demand when the first player joins.
        """
        super().__init__(*args, **kwargs)
        self.group_name = "snek_game"
        self.engine = GameEngine(self.group_name)
        # Runs the engine in a new thread
        self.engine.start()

    def player_new(self, event):
        self.engine.join_queue(event["player"])

    def player_direction(self, event):
        direction = event.get("direction", "UP")
        self.engine.set_player_direction(event["player"], direction)

And the engine is able to send data back to the group by asking for the currently active channel layer:

from channels.layers import get_channel_layer

def broadcast_state(self, state: State) -> None:
    state_json = state.render()
    channel_layer = get_channel_layer()
    async_to_sync(channel_layer.group_send)(
        self.group_name, {"type": "game_update", "state": state_json}
    )

Ok, so we have an engine running infinitely. It is started by a consumer, so has access to the consumers channel layer. But what runs the consumer?! (I'm finally getting to it).

Workers

Have you used celery before? It runs as a separate service alongside your normal Django application server. It can receive and process tasks from a queue.

Well it's the same concept with Channels workers. Channels can run workers, which run on top of channels, and has access to the groups required to communicate back with the websocket consumers.

We start a worker like so:

$ python manage.py runworker game_engine

The game_engine argument refers to the channel name in your routing.py

application = ProtocolTypeRouter(
    {
        "websocket": SessionMiddlewareStack(URLRouter([url(r"^ws/game/$", PlayerConsumer)])),
        "channel": ChannelNameRouter({"game_engine": GameConsumer}),  # THIS RIGHT HERE
    }
)

So the worker process runs separately and creates the GameConsumer which then starts the GameEngine in an infinite loop. The PlayerConsumer websocket consumers can then publish data to the GameConsumer:

async def direction(self, msg: dict):
    await self.channel_layer.send(
        "game_engine",
        {
            "type": "player.direction", 
            "player": self.username, 
            "direction": msg["direction"]
        },
    )

And the GameConsumer can receive that message (player.direction is converted to the method player_direction on the receiver):

def player_direction(self, event):
    direction = event.get("direction", "UP")
    self.engine.set_player_direction(event["player"], direction)

Something to note is that there are actually two channels groups in play here. The first group is the snek_game group, which all of the players are connected to, and what the game publishes the state to. The second group is the game_engine group, which is a channel dedicated to sending player join and player direction messages from the player to the game engine.

The following (simplified) diagram shows all of the components involved, and sample messages that are sent between them.

Diagram

Running the application

There are a few resources out there that describe how to deploy a channels application, but unfortunately many of them are out of date with version 2 of channels. For example, a worker service is no longer required if you just want to handle websockets. Daphne can now do it all.

I'm not going to touch on optimum configuration for a service at scale. With more complex architecture comes more complex operations and failure modes. Each websocket server can only maintain a finite number of clients, since they are persistent, so you'll need to think about how you'll scale your services for any real deployment.

For our deployment, we chose Heroku. They have excellent resources for deploying Django applications so I'm only going to call out where the docs differ. Here are some high level config tasks before getting into the harder parts.

Create the app within the heroku dashboard
Add a node.js buildpack with a custom npm script for building webpack:

"scripts": {
    "heroku-postbuild": "webpack --mode production --config=webpack.config.js --optimize-minimize"
  },

Add a python buildpack as the second buildpack (it needs webpack to compile into static folder)
Add a redis and postgres resource, so that $DATABASE_URL and $REDIS_URL is available to our app

Ok, onto the fun stuff.

You're going to need to customise the Procfile to properly run daphne and our game engine worker:

# Procfile
release: python manage.py migrate --noinput
web: daphne kogame.asgi:application --port $PORT --bind 0.0.0.0
worker: python manage.py runworker game_engine -v2

Heroku will provide the $PORT environment variable when starting the application. We need two services to run. The web service will be started automatically by Heroku. The worker service, which will run our game engine, will need to be manually scaled after the initial deployment.

See that we've told daphne to load an asgi application? It looks just like a normal wsgi file, but allows channels to setup properly and load the routing.py definition:

# kogame/asgi.py
import os
import django
from channels.routing import get_default_application

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "kogame.settings")
django.setup()
application = get_default_application()

That's it, provided you've followed the Heroku documentation up until we started messing with daphne. Checkout kogame/settings.py if you need some inspiration with your own configuration. Let's actually run this thing:

$ heroku login
$ heroku git:remote -a <app-name>
$ git push heroku master
$ heroku -a ps:scale worker=1:free -a <app-name>  # start the worker

Thoughts on Channels

We had a lot of fun building this game, but we also ran into a lot of hurdles along the way. There aren't many tutorials around the web that deviate very far from a toy chat implementation. The good tutorials out there were mostly out of date when we began. The documentation is mostly good but is still missing some of the nuances that you need to know (like the type message argument mapping to a method on the message receiver.)

Once we had the architecture setup and the code written though, everything worked as advertised. It hides so very much of the complexity of a distributed system that, once you have a good grip on the role of each component, it becomes easy to add additional behaviour.

I personally think there is a big open space for tooling on top of channels, such as specialised Consumers, message validation libraries, and group interfaces. Once more and more people begin using channels in production applications, patterns will emerge, as will the libraries to address and enforce them.

It's an exciting time to be working with Django! And remember, if hackdays sound like your kind of fun, and Django is your cup of tea, we're hiring!

Kogame (Koh-Gah-Mi) - A real time game in Django