In part one we introduced the basics of matchmaking, running through Quality of Service (QoS), skills and attributes, and algorithms.
In part two we pick up the thread once more with Caleb Atwood, who is part of the team working on the Connected Games solutions for Unity and Multiplay. In this blog we’re going to dive into allocations, infrastructure and then run through some of the services we’re working on at Unity and Multiplay.
Before we get to the infrastructure, what do we do when we finally have a match of players and need to start spinning up game server resources for those matches?
For a large number of server-runtimes, there will also be a non-trivial cost to spinning up a game server from scratch. This is time spent initializing the engine, loading collision models, downloading dynamic configurations, and opening connections for players. Making players wait for that spin-up could take seconds, or even minutes, especially if you’re also having to request additional cloud compute VMs on the fly.
While there are lots of ways to provide “on-demand compute”, most approaches involve predicting and prescaling server capacity to create room for your match creation rate to grow into the servers that are preempted and available. For Multiplay, the lifecycle piece is handled by an Allocation [https://docs.multiplay.com/display/CF/Guide+to+Clanforge+Allocations]. In simple terms, an allocation is a desire to have a game server of a specific profile, in a given region. Multiplay handles the rest, including prediction-based scaling, spinning up extra capacity, exposing game ports, starting the game server runtime, and eventually returning the necessary information for clients to connect to the server.
In order to take full advantage of allocations and the idea that a game server is preempted and waiting for connections, you also have to take some additional steps to “put a server back on the market” once the game is completed and it has finished reinitialized for a new game. By doing this, you reduce the compute time spent spinning up the game server runtime from scratch. We call this a Deallocation.
To achieve this, you’ll want to track the game-in-progress at a level sufficient enough to determine when a game server is ready to be used by another match coming down the pipe.
- Take a match that’s ready to go
- Allocate a server and save the AllocationId
- Poll for the Allocation to receive a running server (ip and port)
- Notify the players of the connection information
- Wait for the game to finish
- Call Deallocate on the AllocationId, freeing up the now-ready server for another game
There are several key implications created when matchmaking is combined with a preemptible game server runtime provider.
The first is the where. Either…
a) The matchmaking system is aware of the available regional capacity of running game servers. This could be achieved via the matchmaking backend supporting some sort of declarative registration mechanism for running capacity.
b) the game server management system provides an abstraction of (a), creating a best effort approach to fulfilling the matchmaker’s demand for servers in regions based on the matchmaker’s population and outputs. This model requires that the server provider be ‘scalable’ and reactive to capacity needs.
Each have merits, and really (b) could be thought of as a wrapped system around (a). Ideally, a matchmaking service would support some features of both. In (a)’s case, bring-your-own servers allows for rapid development cycles and direct access to the server runtime. However, you’ll need to implement your own scaling and capacity planning. In (b)’s case, the scaling and capacity is handled by the provider, and any other needs will need to be features of the provider.
For a scalable provider like Multiplay, the benefits of (b) for the matchmaker are clear. The matchmaking system “speaks” allocations and asks for a game server in regions where the game could be played, and Multiplay handles the regional capacity scaling.
Game server profiles
The second implication is which build should be spun up. For matchmaking in particular, there may be a desire to support multiple versions or multiple images of server builds simultaneously, pushing beta players into experimental builds of the game. For Multiplay, this is handled by the Game Server Profile.
Considering an allocation request by the matchmaking system is a desire for a server of a specific profile in region(s), Multiplay supports this out of the box! Which profile to select is a function of making the matchmaking service understand which players route to which profiles during the matchmaking algorithm.
Now that we understand the two halves of matchmaking, what do we actually need to build for our production-ready matchmaking service?
- A scalable frontdoor to handle incoming client requests and connections
- A performant database for indexing the player data into easy-to-query pools
- A backend for running the matchmaking algorithms with access to the queryable data
- A metrics system with the tooling to understand the population data for algorithm decision-making, iterating, and debugging
- A scalable system for creating, monitoring, and returning game server allocation information
- A system for triggering deallocations based on the game server state
For this post, we won’t explore the pros and cons of multi-region matchmaking services in-depth. At a high level, by paying more for service redundancy, you increase reliability and availability, but risk potentially partitioning your player population into disparate sections, creating accidental side-effects like “localized skill.” Assuming your matchmaking data can be reassociated into global data (like a global skill or ladder), then the risk can be somewhat mitigated.
Indexing and data
At any given point, there may be thousands, tens of thousands, or even hundreds of thousands of players trying to find games at the same time. One great matchmaking design problem is waiting for the “right” amount of time so that a player gets into a good match instead of a rushed match.
In order to work on that problem, you’ll want a highly indexable database and probably a consolidated data-logic layer to help you optimize your queries for performance.
When looking at breaking up your population into queryable segments, consider moving as MUCH of your matchmaking logic into preemptible indexes as possible (think fitness vectors, predictive labeling, etc.).
This type of information can drastically reduce the number of considerations the algorithms will need to compute. The better you understand your data going into the matchmaker, the faster and more accurate you can make the matchmaking algorithms.
Scale and speed
Traditionally, matchmaking systems that end up struggling follow a common playbook:
- Make a scalable frontdoor to handle client requests, storing the data into a shared memory cache
- Build a loop behind the service to execute all the algorithms on the cache
- Run it on repeat as fast as possible over the whole population
- Optimizing consists of adding hard-filter queues and eventually partitioning the player population
While this can get your game out the door, it’s drastically bounded by the performance of the algorithms operating on large chunks of the population. Even if your hard-filter queues provide some reduction in scope, an overnight success will almost assuredly knock this over. Not to mention the addition of new features and game design logic over time will inevitably prove too much.
So where can we start to improve the reliability of the service? Luckily, there are some unique attributes about this type of data that we can use to spread out some of the load.
Namely, we can favor a more distributed approach. For example, the matchmaking data itself is somewhat ephemeral and tolerant to some latency. Taking advantage of a high-write master + high-read slave replication on a database will prevent large reads by the algorithms from bottle-necking your ingress indexing. Additionally, spreading out the algorithms into schedulable runtimes across several machines will help alleviate CPU-starving.
This begs the question: Has someone solved all of this for us already?
New and existing services
[Full-disclosure, I contribute to Open Match and the Unity hosted solution below.]
As you can imagine, the scope of existing solutions is complicated and what you choose will depend entirely on your needs (I bet you haven’t heard that a thousand times). However, there is a very common theme in many off-the-shelf matchmakers, which is that they rely heavily on being configurable.
At the core of these matchmakers is a configurable component that allows the developer to tune specific parameters inside a generalized matchmaking algorithm and runtime.
While these systems solve for scale, the complaint we often hear is that configurability is cumbersome, limits customization, and, in some cases, can create a lock-in effect. Once locked in, the developer gets stuck deciding between canceling new features in their game due to limitations of the matchmaker, or risk an expensive backout attempting to transition to a new or custom matchmaking system.
Open Match, co-founded by Google and Unity, is an open source game matchmaking framework that provides the core service infrastructure and performance necessary to run your own custom matchmaking logic at scale. It provides:
- Parallelized distributed match function lifecycle
- Custom match function algorithm execution
- Scalable, event-driven microservices architecture
- High availability data-layer for optimizing queries and reliability
- Extensible design for implementing custom match-quality evaluation
At its core is a component called a match function, which is a completely custom, hosted piece of code the developer can use to implement their tailor-made matchmaking algorithms. Where custom code is overkill, there will also be a canonical set of functions that provide more generalized and configurable approaches to common requirements in the style of configurable matchmakers. Of course, being open source means you can always fork the code and add whatever customizations you need where the canonical algorithms fall short.
Open Match (in V0.4 at the time of writing), does require some work and services experience to get started. However, updated docs/guides are coming soon and contributions are greatly welcome!
For more information on Unity/Google’s matchmaker, check out our talk from Unite Berlin:
Built on Open Match, the new Multiplay matchmaking system provides all the same underlying benefits, but also adds seamless integration with Multiplay’s allocation system should your game require a dedicated server architecture. Together, you get the benefits of a customizable multi-platform matchmaking service combined with a highly-scalable game server provider.
In terms of customization, one of the goals of the new system is to provide as much access to the underlying infrastructure as possible. This means custom match functions, persistent player data, and bring-your-own identity… as well as the tooling features you’ve come to expect from Unity.
It’s hosted in our cloud ecosystem, which means we also take care of the operation, monitoring, lifecycle, and scaling of the matchmaking infrastructure. This is especially handy if services development isn’t a priority for your team. Not to mention, the new service is also completely engine agnostic.
To stay in the loop, follow Multiplay on our social channels: LinkedIn, Twitter and Facebook. The project is still in alpha, but you can read more about the Connected Games roadmap in our blog and on this webpage.