4. Applications
Orleans has been used as a platform for building and running multiple cloud services by different teams, including all cloud services for the Halo 4 video game. This section describes three services built for two dif- ferent games. Those services used Orleans to implement different parts of the game backend logic, with distinctly different usage patterns and performance characteristics. Most of the production scale and performance figures are confidential, so we report on measurements we performed in our lab in pre-production testing.
4.1 Halo 4 Presence service
The Presence service is responsible for keeping track of all active game sessions, their participating players, and evolving game status. It enhances the matchmaking experience for players, allows joining an active game, enables real-time viewing of a game session, and other functionality. Each game console running Halo 4 makes regular heartbeat calls to the service to report its status of the game in progress. The frequency of the heartbeat calls is controlled by the service, so it may be increased for more interactive experiences, such as real-time viewing of a game via a companion mobile application. Additional service calls allow querying and joining live sessions, but we limit our description to just heartbeats.
In a multiplayer game session, each console sends heartbeat messages with game status updates to the service independently. The game session state is not saved to durable storage. It is only kept in memory because the ground truth is always on the consoles, and it takes only a single heartbeat update from any player to recover the game session state in case of a failure. The payload of a heartbeat message contains compressed session data including the unique session ID, the player IDs, and additional game data. The session data has to be de-compressed before processing.
The structure of the Presence service is shown in Figure 1. There are 3 types of actors: Router, Game Session, and Player. Incoming heartbeat requests from consoles arrive to the Router actor, which decompresses the data, extracts the session ID, and forwards the request to the right Session actor. There is one Game Session actor for every session ID. The Session actor updates its internal in-memory state, and periodically, but not on every heartbeat, calls Player actors using the player IDs extracted from the heartbeat data. Player actors also serve as rendezvous points for an external observer, such as the mobile phone companion application, for finding the current game session for a given user. The observer first calls the user’s Player actor using her ID as the key. The Player actor returns a reference to the Game Session actor that the player is currently playing. Then the Observer subscribes to receive real-time notifications about updates to the game session directly from the Game Session actor.
Since the Router actor is stateless, Orleans dynami- cally creates multiple activations of this single logical actor up to the number of CPU cores on each server. These activations are always local to the server that received the request to eliminate an unnecessary network hop. The other three actor types run in a single-activation mode, having 0 or 1 activation at any time, and their activations are randomly spread across all the servers.
The implementation of the service benefited from the virtual nature of actors, as the code the developers had to write for making calls to actors was rather simple: create a reference to the target actor based on its type and identity, and immediately invoke a method on the promptly-returned actor reference object. There was no need to write code to locate or instantiate the target actor and manage failures of servers.
4.2 Halo 4 Statistics Service
Statistics is another vital Halo 4 service. It processes results of completed and in-progress games with details of important events, such as successful shots, weapons used, locations of events on the map, etc. This data accounts for players’achievements, rank progression, personal statistics, match-making, etc. The service also handles queries about players’details and aggregates sent by game consoles and the game’sweb site. Halo 4 statistics are very important, as players hate to lose their achievements. Therefore, any statistics report posted to the service is initially pushed through a Windows Azure Service Bus [18] reliable queue, so that it can be recovered and processed in case of a server failure. Figure 2 shows a simplified architecture of the Statistics service with a number of secondary pieces omitted to save space. The front-end server that receives an HTTP request with a statistics data payload saves the data in the Azure Service Bus reliable queue. A separate set of worker processes pull the requests from the queue and call the corresponding Game Session actors using the session ID as the actor identity. Orleans routes this call to an activation of the Game Session actor, instantiating a new one if necessary. The Game Session actor first saves the payload as-is to Azure BLOB store, then unpacks it and sends relevant pieces to Player actors of the players listed in the game statistics. Each Player actor then processes its piece and writes the results to Azure Table store. That data is later used for serving queries about player’s status, accomplishments, etc.
The operations of writing game and player statistics to the store are idempotent, so they can be safely replayed in case of a failure. If the request fails to be processed (times out or fails with an exception), the worker that dequeued the request will resubmit it.
Orleans keeps Game actors in memory for the duration of the game. The actors process and accumulate partial statistics in a cache for merging at the end of the game. Similarly, a Player actor stays in memory while it is used for processing statistics or serving queries, andcaches the player’s data. In both cases, caching reduces IO traffic to the storage and lowers latency.
4.3 Galactic Reign Services
Galactic Reign is a turn-based, head-to-head game of tactical expansion and conquest for phones and PCs. Each player submits battle orders for a game turn. The game processes the turn and advances to the next one.
Galactic Reign uses four types of actors: stateful Game Session and stateless Video Manager, House- keeper, and Notification actors. Each Game Session actor executes the game logic when battle orders for the turn are received from the players, and produces results that are written-through to persistent Azure Storage. Then it sends a request to the video rendering service to generate a set of short (up to 90s) videos for the turn.
Each Game Session actor holds a cached copy of the current session state for that game. The game state data can be large (multi-megabyte) and takes some time to read and rehydrate from the storage, so the system keeps active sessions in memory to reduce processing latency and storage traffic. Inactive game sessions are deactivated over time by the Orleans runtime. They are automatically re-activated later when needed.
A pool of Video Manager actors handles submission of jobs to the video rendering system, and receiving notifications when render jobs are completed. Since these actors are stateless, the Orleans runtime transparently creates additional activations of them to handle increased workload. The actors set up timers to periodically poll for completed rendering jobs, and forward them to the Notification actors.
Once a game turn is completed and its video clips are generated, a Notification actor sends a message to the game clients running on devices. The Housekeeper actor sets up a timer that periodically wakes it up to detect abandoned games and clean up the persisted game session data which is no longer needed.
4.4 Database Session Pooling
A common problem in distributed systems is managing access to a shared resource, such as a database, queue, or hardware resource. An example is the case ofNfront-end or middle tier servers sharing access to a database withMshards. When each of the servers opens a connection to each of the shards,N×Mdatabase connections are created. WithNandMin the hundreds, the number of connections explodes, which may exceed limitations of the network, such as the maximum number of ports per IP address of the network load balancer.
Orleans provides an easy way to implement a pool of shared connections. In this application, an actor type Shard is introduced to encapsulate an open connection to a database shard. Instead of opening direct connections, the application uses Shard actors as proxies for sending requests to the shards. The application has full control of the number of Shard actors, and thus of the database connections, by mapping each database shard to one or a few Shard actors via hashing. An added benefit of implementing the connection pool with virtual actors is the reliability of the proxies, as they are automatically reactivated after a server failure. In this scenario, Orleans is used to implement a stateful connection pool for sharing access to the limited resources in a dynamic, scalable, and fault tolerant way.