This solution presents a highly scalable and reliable gaming implementation on Google Cloud Platform that uses Google App Engine and Google Compute Engine for real-time player interactions. The solution powers core game elements, such as game matchmaking and player customization, by using App Engine, while using Compute Engine to run dedicated game servers and common game engines.
Key points covered in this solution include:
  • Scaling to serve hundreds to millions of players.
  • Using Cloud Platform to build a fully featured game experience.
  • Leveraging App Engine for front-end interactions and maintaining game state in the datastore.
  • Orchestrating and autoscaling Compute Engine dedicated game servers with App Engine.
  • Gaining business insights by analyzing massive datasets that contain information about users and game statistics.
As the number of people who play games continues to increase, the amount of computing resources you need to power compelling gaming experiences can grow quite large. Online gaming has grown from just a few people running game servers in their garages to millions of players enjoying a seamless online experience with matchmaking, in-game purchases, and friend lists. These common game components have resulted in the development of sophisticated, distributed systems that can rival high-performance computing and large-scale web implementations. The fierce competition to develop and deliver the next blockbuster game or viral social sensation requires you to carefully manage your resources to focus on critical game components.
Overprovisioning a cloud platform or focusing on unnecessary complexities results in significantly less human and financial resources to focus on gameplay or graphic assets. By using Google Cloud Platform, you can focus on creating unique game experiences while taking advantage of Google’s extensive experience in developing distributed systems.
This solution leverages the scalability and reliability of App Engine to match players with game sessions that run on Compute Engine game servers. App Engine is an extensible platform that can provide required features, such as user profiles, game matchmaking, in-game purchases, social communities, and mobile engagement. Ideally, you would use App Engine to power all aspects of the online game, but you might want access to virtual machines (VMs) for running common game engines and software development kits (SDKs). That’s where Compute Engine comes in. Though many aspects shown in this solution can be used for a pure App Engine implementation, the primary focus here is on running dedicated game servers on Compute Engine.
The services used in this solution include:
  • Google App Engine
    • Powers the main graphical user interface to provide game and user settings.
    • Provides matchmaking and server browsing.
    • Distributes load to Compute Engine instances.
    • Maintains clusters to handle player gameplay load.
  • Google Compute Engine
    • Runs custom game servers.
  • Google BigQuery
    • Analyzes massive game and user data sets.
  • Google Cloud Storage
    • Stores game server binaries.
    • Distributes game client binaries and game assets.
    • Stores backup logs to process and load into BigQuery.

Overview of the solution

The reference architecture diagram, shown in Figure 1, provides a high-level overview of how Compute Engine and App Engine integrate to create a scalable and reliable online gaming solution.

Figure 1. Reference architecture diagram for an online gaming solution

Key components of the online gaming solution

A user starts playing a match of their favorite game by loading the local application or navigating to the game’s website. If it is their first time playing, all client binaries and game assets can be downloaded from Cloud Storage. Although game clients for mobile devices and personal computers vary, the core game features can be provided for all devices. Core features include updating user profiles, managing player configurations, and checking friends’ achievements. App Engine can be used for all devices by directly serving websites or providing a RESTful API for accessing all the required information.
The following sections describe each of the key components of the proposed gaming solution in more detail.

Selecting a game server

Allowing players to join a game server and interact with other players is one of the most important components of the main interface. Matchmaking is an integral part of this gaming solution because it matches players with people in the same region and game modes. Depending on the search, performance, and scalability requirements, this solution can also be extended to include a full featured server browser and search capability by leveraging Cloud SQL, Google Custom Search API, or Google Cloud Datastore.

Connecting to the dedicated game server

After the player chooses a server to join and the game client receives the dedicated server’s IP address, the player’s game client establishes a connection to the dedicated server running on Compute Engine and loads in-game assets.
The Compute Engine game servers are responsible for handling all player interactions through low latency client server communication. Information about designing a multiplayer game server is beyond the scope of this paper. When designing multiplayer game servers, it is recommended to leverage existing game servers and SDKs.

Managing in-game requests and Compute Engine instance health checks

When a dedicated game server is running on Compute Engine, it might need to send in-game requests to App Engine. If a player has purchased items from a store or created custom game configurations, App Engine can maintain this information. Also, the dedicated game server can communicate back to App Engine to update player scores, statistics, and experience levels.
After a round has completed, players can either remain on a game server for a new round or be redirected towards server matchmaking. Player’s scores, match statistics, and in-game store recommendations can be displayed between rounds. If a dedicated game server terminates unexpectedly, the game client must handle this event and redirect players towards server matchmaking for a new session.

Autoscaling game servers

Autoscaling is one of the first background tasks that does not significantly affect gameplay, but is critical to building a scalable, fully featured game. This step indicates the dedicated game server autoscaling logic implemented by a developer in App Engine. As the number of players increases, VM orchestration logic creates new dedicated servers to handle the increased load. Similarly, if the number of players decreases during the day, unused dedicated servers can be shut down to reduce costs.

Storing logs for analysis

Cloud Storage is a good choice for storing files, such as server logs and output from analysis pipelines. The dedicated game servers on Compute Engine produce a significant amount of valuable data for understanding player behavior and troubleshooting software bugs. In order to store this data long-term, files should be regularly uploaded to Cloud Storage from Compute Engine Instances using a background process. If analysis pipelines are required for transforming and aggregating data, the relevant files can be downloaded from Cloud Storage and processed on additional Compute Engine instances. Output from jobs can be stored in Cloud Storage where it can be used as input for additional pipelines, ingested into Google BigQuery, or compiled into reports. For more information, see Processing Logs at Scale Using Cloud Dataflow and Real-time Logs Analysis Using Fluentd and BigQuery.

Analyzing massive user and game datasets using BigQuery

Integrated into this solution is BigQuery, an ad-hoc query tool for analyzing massive datasets in real time. When dedicated game servers host millions of active players, you can get billions of rows of useful data. Whether it’s raw game logs or output from analysis pipelines, the data can be loaded into BigQuery using a predefined schema. After loading has finished, SQL-like queries complete within seconds and can be used to obtain valuable information such as user engagement and the impact of game incentives.

Implementation details

This section provides implementation details for distributing player load and providing core game functionality required for creating a full-featured game experience. The primary focus of this solution is on the core scenario of distributing game servers to handle real-time player interactions. This solution can be expanded to provide additional features as a full-store and social community model, but these features are outside the scope of this paper.
The following architecture diagram shows a scalable, dedicated-server gaming solution.

Figure 2. Implementation details for a dedicated-server gaming solution

Requesting a game server

Players use the game server browser to request a list of recommended game servers based on matchmaking criteria. This request is submitted through Google Cloud Endpoints which provides an authenticated RESTful API powered by App Engine.

Matchmaking logic

Matchmaking logic provides users with a list of recommended servers. Depending on the scale and frequency of matchmaking requests, you can use various techniques to implement this solution. One approach is to use App Engine background tasks to maintain a list of recommended servers in each datacenter and then store the list in Memcache for quick retrieval. The logic for recommending servers depends on the type of game. Some games, for example, recommend servers with the lowest load to minimize latency, while other games need a minimum number of people connected to each server for more matchmaking permutations. Although Memcache provides a high-performance, distributed-memory, object-caching system, the recommendations must also be stored in Cloud Datastore to handle Memcache evictions. The recommendation background tasks can be scheduled to run every minute by the App Engine Cron service. It’s important for the background tasks to maintain a list of recommended servers for each region because players usually want to connect to the lowest latency server. Other server selection techniques may round-robin through available servers or provide a reasonably sized list to the client so it can identify the lowest-latency servers. More complex solutions can include tasks that maintain player counts, load, latency, and states for all servers. Solutions can also be designed to query dynamically for each request.

Returning matchmaking results

The results from matchmaking are returned to the client, where the player either selects from a list or the client automatically chooses the ideal server.

Connecting to the dedicated server

The game client attempts to connect to the IP address of the selected dedicated game server. If the connection fails or the server is full, the client can try to connect to alternate suggested servers or direct the player back to server matchmaking.

Handling in-game requests

After players establish a connection to the dedicated game server, the server is responsible for handling all events from the player and providing information about other players currently in the match. App Engine maintains a consistent game experience across all dedicated servers by handling important events and providing player information. For example, if the player has a custom configuration, a request to App Engine provides information about the configuration and allows the player to access all purchased items. As players gain experience and important in-game events occur, details are sent to App Engine to maintain a complete view of all players. An authenticated request to Cloud Endpoints and the provided RESTful API is an easy way to connect the game servers to App Engine.

Requesting player configurations

When games allow a player to purchase items or create a custom character configuration, the information must be maintained in a reliable and scalable database. Cloud Datastore is designed to scale for web applications that serve millions of users. Cloud Datastore is recommended for storing all player information because it seamlessly scales as a game grows from hundred to millions of players. Memcache can also be leveraged to store results from frequent Cloud Datastore queries in order to improve performance. Because Memcache is a finite resource, be careful about how you use it. If you need to use complex SQL queries or need to use MySQL for other reasons, Cloud SQL provides a fully managed and highly available relational database-as-a-service. Although it is tightly integrated with App Engine, Cloud SQL is not designed to scale infinitely, and load testing is highly recommended to see real-world database performance. For high performance, where keeping costs to an absolute minimum is not an issue, consider using Google Cloud Bigtablefor suitable scenarios and data types.

Storing important in-game events

Handling and storing important events, such as players gaining experience after in-game actions, is a critical part of creating an engaging game. Similar to requests for player configuration, these requests are handled by App Engine, you can store key information in Cloud Datastore. The major difference between these two types of requests is that in-game events can occur at a higher frequency for all active players. For example, a player’s configuration might only be obtained at the start of the match, though in-game events can happen every time a player’s character gains experience. Although Cloud Datastore can scale to handle thousands of events from millions of users, you should understand entity groups, NoSQL, and eventual consistency to eliminate potential scalability concerns. Cloud Bigtable can also be a good solution, here, for applicable scenarios and data types.

Tracking server health

A critical component of maintaining a healthy cluster of dedicated game servers is continual tracking of each server’s statistics and health. Once again, you can use Cloud Endpoints to provide an authenticated RESTful API where a process running on each Compute Engine instance can provide statistics about usage. Hardware-related information, such as CPU and RAM usage, can be provided along with game specific information, such as average player-latency and number of players active on the server.

Storing server health and statistics

The heartbeat process running on each Compute Engine instance can provide valuable information. Server heartbeat logic is required to parse and store relevant data. Information directly related to autoscaling, such as the number of players active on servers and average latency, should be cached in Memcache for quick retrieval by the autoscaling backend processes. Any important values should also be stored in Cloud Datastore to protect against Memcache eviction. If this information is also relevant for analytics and maintaining server history, store all historical values in a separate table that is used independent of autoscaling.

Autoscaling dedicated game servers and maintaining a healthy cluster

Although there are many approaches to autoscaling Compute Engine resources with respect to player load, the common component involves running a scheduled task every minute with App Engine’s Cron service. You can calculate the ideal number of virtual machines by a predetermined schedule or by analyzing the number of available positions in game servers or player latency. The other important input to autoscaling is determining the currently active healthy machine count by pulling recent heartbeat process data from Memcache or Cloud Datastore. You can compute the difference between the ideal and current number of game servers to decide when and how to create or delete instances. Additionally, any unhealthy servers should be configured to eliminate them from server selection and delete the instances after there are no players on the server. If game servers need to be migrated between different Compute Engine zones, the autoscaling logic can be used to create instances in the new zone while terminating vacant instances in the old zone. This is a very high-level overview of autoscaling game servers and it is strongly recommended to carefully implement the scaling algorithms. Focus on avoiding issues such as overshoot and noisy responses. Compute Engine servers are billed on a per-minute basis with a minimum 10 minute charge, so to reduce unnecessary costs from unused Compute Engine instances, avoid frequently creating and deleting instances.

Creating and deleting dedicated game servers

When a game server must be deleted or created, a task is added to an App Engine task queue. A separate background task is responsible for pulling server maintenance tasks and making Compute Engine API calls. Additional backends can be added if the number of required API calls increases beyond the limits of a single backend. If there are few Compute Engine API calls, the server maintenance can be handled by a scheduled task to reduce usage of App Engine resources. You should include a timestamp with every server maintenance task to create alerts if a backlog develops in the system. You can use push queues as an alternative to pull queues. You should run load tests to evaluate how each autoscaling system responds under heavy utilization. Although Google does not provide a load testing service, common open source technology can be run on Compute Engine or third party services can be used for extensive load testing.

Storing logs in Cloud Storage

Many log files can be generated on each game server, such as in-game server logs that record every player’s actions and movements, or logs that record end-game statistics. You can copy these files to Cloud Storage by using a background process that runs at regular intervals. If the files contain critical data, you should store them on a persistent disk in order to prevent data loss, such as when an instance terminates before the copying process is done. Otherwise, storing files on a standard disk provides a lower cost alternative, but the disk will be deleted immediately after an instance terminates. Regardless of disk choice, you should have an automated copying process for maintaining all logs and statistics in Cloud Storage.

Transforming and processing log files

After collecting a large amount of raw log data from servers, the log files need to be cleaned, augmented with additional data, and aggregated to different levels. You can use MapReduce or extract, transform, and load (ETL) tools, such as Google Cloud Dataflow, to create data that you can use for user-facing features, such as purchase recommendations, or load into Google BigQuery for analysis.

Reporting and analytics

BigQuery is an important part of a gaming solution because it allows ad-hoc analysis of massive datasets that contain user- and game-related information. For example, BigQuery can be used to determine the impact of gameplay incentives, such as store sales, on user retention and engagement. BigQuery maintains consistent performance as data scales to terabytes and billions of rows.

Sample app

A sample application that demonstrates the high level concepts of this solution is available and it can be used as working reference. The core functionality of the sample app includes:
  • The client queries App Engine for an IP address of a dedicated game server.
  • The client starts a new game by connecting to a game server running on Compute Engine.
  • Administrators can create and delete game servers from App Engine Administration UI.
  • Compute Engine instances report load levels to App Engine periodically.
  • Administrators can view load levels of all game server Compute Engine instances.
  • App Engine automatically adds new instances to the cluster if the cluster exceeds a maximum-load threshold.

Download the source code from Github.


View latest version of this solution on the Google Cloud site »