Portable Gaming experience
- Instant play when the page is opened; no download, no install
- Running on browser, mobile, so you don’t need any software to launch
Gaming sessions can be shared across multiple devices and stored on cloud storage for next time
Game is both streamed and playable and multiple users can join the same game:
- Crowdplay like TwitchPlayPokemon but more real time and seamless
- Online multiplayer for offline games without network setting. Samurai Showdown is now playable with 2 players over the network in CloudRetro
Infrastructure
Requirement and Tech Stack
Below is the list of requirements I set before starting the project.
1. Singleplayer:
This requirement sounds not relevant and straightforward, but it’s one of my key findings that makes cloud gaming stand away from traditional streaming services. If we focus on singleplayer, we can get rid of a centralized server or CDN because we don’t need to distribute the stream session to massive users. Rather than uploading streams to an ingest server or passing packets to a centralized WebSocket server, the service streams to the user directly over a WebRTC peer connection.
2. Low Latency media stream
When I research about Stadia, some articles are mentioning the application of WebRTC. I figured out WebRTC is a remarkable technology and fits this cloud gaming use case nicely. WebRTC is a project that provides web browsers and mobile applications with Real-Time Communication via simple API. It enables peer communication and is optimized for media and has built-in standard codecs like VP8 and H264.
I prioritized delivering the smoothest experience to users over keeping high-quality graphics. Some loss is acceptable in the algorithm. On Google Stadia, there is an additional step to reduce image size on the server, and image frames are rescaled to higher quality before rendering to peers.
3. Distributed infrastructure with geographic routing.
No matter how optimized the compression algorithm and the code is, network is still the crucial factor contributing the most to latency. The architecture needs to have a mechanism to pair the closest server to the user to reduce Round Trip Time (RTT). The architecture contains a single coordinator and multiple streaming servers distributed around the world: US West, US East, Europe, Singapore, China. All streaming servers are fully isolated. The system can adjust its distribution when a server joins or leaves the network. Hence, under high traffic, adding more servers allows horizontal scaling.
4. Browser Compatible
Cloud Gaming shines the best when it demands the least from users. This means being able to run on a browser. Browsers help bring the most comfortable gaming experience to users by removing software and hardware installs. Browsers also help provide cross-platform flexibility across mobile and desktop. Fortunately, WebRTC has excellent support across different browsers.
5. Clear separation of Game interface and service
I see the cloud gaming service as a platform. One should be able to plug in any to the platform. Currently, I integrated LibRetro with the Cloud Gaming service because LibRetro offers a beautiful gaming emulator interface for retro games like SNES, GBA, PS.
6. Room based mechanism for multiplayer, crowd play and deep-link to the game
CloudRetro enables many novel gameplays like CrowdPlay and Online MultiPlayer for retro games. If multiple users open the same deep-link on different machines, they will see the same running game as a video stream and even be able to join the game as any player.
Moreover, Game states are stored on cloud storage. This lets users continue their game at any time on any different device.
7. Horizontal scaling
As every SAAS nowadays, it must be designed to be horizontally scalable. The coordinator-worker design enables adding more workers to serve more traffic.
8. Cloud Agnostic
CloudRetro infrastructure is hosted on various cloud providers (Digital Ocean, Alibaba, custom provider) to target different regions. I dockerize the infrastructure and configure network settings through bash script to avoid dependency on any one cloud provider. Combining this with WebRTC’s NAT traversal, we can gain the flexibility to deploy CloudRetro on any cloud platform and even on any user’s machines.
Architectural design
Worker: (or streaming server as referred above) spawns games, runs encoding pipeline, and streams the encoded media to users. Worker instances are distributed around the world, and each worker can handle multiple user sessions concurrently.
Coordinator: in charge of pairing the new user with the most appropriate worker for streaming. The coordinator interacts with workers over WebSocket.
Game state storage: central remote storage for all game states. This storage enables some essential functionalities such as remote saving/loading.
User flow
When a new user opens CloudRetro at steps 1 and 2 shown in the image below, the coordinator is requested for the frontend page along with the list of available workers. After that at step 3, the client calculates latencies to all candidates using an HTTP ping request. This list of latencies is later sent back to the coordinator so that it can determine the most suitable worker to serve the user. At step 4 below, the game is spawned. A WebRTC stream connection is established between the user and the designated worker.
Inside the worker
Inside a worker, game and streaming pipelines are kept isolated and exchange information via an interface. Currently, this communication is done by in-memory transmission over Golang Channels in the same process. Further segregation is the next goal – i.e., running the game independently in a different process.
The main pieces are:
- WebRTC: Client-facing component where user input comes in and the encoded media from the server goes out.
- Game Emulator: The game component. Thanks to Libretro library, the system is capable of running the game inside the same process and internally hooking media and input flow. In-game frames are captured and sent to the encoder.
- Image/Audio Encoder: The encoding pipeline, where it accepts media frames, encodes in the background, and outputs encoded images/audio.
Implementation
CloudRetro relies on WebRTC as the backbone, so before going into details about my implementation in Golang, the first section is dedicated to introducing WebRTC technology. It is an awesome technology that greatly helps me achieve sub-second latency streaming.
WebRTC
WebRTC is designed to enable high-quality peer-to-peer connections on native mobile and browsers with simple APIs.
NAT Traversal
WebRTC is renowned for its NAT Traversal functionality. WebRTC is designed for peer-to-peer communication. It aims to find the most suitable direct route avoiding NAT gateways and firewalls for peer communication via a process named ICE. As part of this process, the WebRTC APIs find your public IP Address using STUN servers and fallback to a relay server (TURN) when direct communication cannot be established.
However, CloudRetro doesn’t fully utilize this capability. Its peer connections are not between users and users but between users and cloud servers. The server side of the model has fewer restrictions on direct communication than a typical user device. This allows doingthings like pre-opening inbound ports or using public IP addresses directly as the server is not behind NAT.
I previously had ambitions of developing the project to become a game distribution platform for Cloud Gaming. The idea was to let game creators contribute games and streaming resources. Users would be paired with game creators’ providers directly. In this decentralized manner, CloudRetro is just a medium to connect third-party streaming resources with users, so it is more scalable when the burden of hosting does not rely on CloudRetro anymore. WebRTC NAT Traversal will play an important role when it eases the peer connection initialization on third-party streaming resources, making it effortless for the creator to join the network.
Video Compression
Video compression is an indispensable part of the pipeline that greatly contributes to a smooth streaming experience. Even though It is not compulsory to fully know all of VP8/H264’s video coding details, understanding its concepts helps to demystify streaming speed parameters, debug unexpected behavior, and tune the latency.
Video Compression for a streaming service is challenging because the algorithm needs to ensure the total encoding time + network transmission + decoding time is as small as possible. In addition, the encoding process needs to be in serial order and continuous. Some traditional encoding trade-offs are not applicable – like trading long encoding time for smaller file size and decoding time or compressing without order.
The idea of video compression is to omit non-essential bits of information while keeping an understandable level of fidelity for users. In addition to encoding individual static image frames, the algorithm made an inference for the current frame from previous and future frames, so only the difference is sent. As you see in the the Pacman example below, only the differential dots are transferred.
Audio Compression
Similarly, the audio compression algorithm omits data that cannot be perceived by humans. The audio codec with the best performance currently is Opus. Opus is designed to transmit audio wave over an ordered datagram protocol such as RTP (Real-time Transport Protocol). It produces lower latency than (mp3, aac) with higher quality. The delay is usually around 5~66.5 ms
Pion, WebRTC in Golang
Pion is an open-source project that brings WebRTC to Golang. Rather than simply wrapping the native C++ WebRTC libraries, Pion is a native Golang implementation for better performance, better Golang integration, and version control on constitutive WebRTC protocols.
The library also provides sub-second latency streaming with many great built-ins. It has its own implementation of STUN, DTLS, SCTP, etc… and some experiments of QUIC and WebAssembly. This open-source library itself is really a good source of learning with a great document, network protocol implementations and cool examples.
The Pion community, led by a very passionate creator, is lively and has many quality discussions about WebRTC. If you are interested in this technology, please join http://pion.ly/slack – you will learn many new things.
Write CloudRetro in Golang
Go Channel In Action
Thanks to Go’s beautiful channel design, event streaming and concurrency problems are greatly simplified. As in the diagram, there are multiple components running parallely in different GoRoutines. Each component manages its own state and communicates over channels. Golang’s select statement enforces that one atomic event is processed each game tick. This means locking is unnecessary with this design. For example, when a user saves, a completed snapshot of the game state is required. This state needs to remain uninterrupted by running input until the save is complete. During each game tick, the backend can only process either save operation or input operation, so it is concurrently safe.
|
func (e *gameEmulator) gameUpdate() { for { select { case <-e.saveOperation: e.saveGameState() case key := <-e.input: e.updateGameState(key) case <-e.done: e.close() return } } } |
Fan-in / Fan-out
This Golang pattern perfectly matches my use-case for CrowdPlay and Multiple Player. Following this pattern, all user inputs in the same room are fanned-in to a central input channel. Game media is then fanned-out to all users in the same room. Hence, we achieve game state sharing between multiple gaming sessions from different users.
Synchronization between different sessions
Downsides of Golang
Golang isn’t perfect. Channel is slow. Compared to lock, Go channel is just a simpler way to handle concurrency and streaming events, but channel does not give the best performance. There is a complex locking logic under a channel. Hence, I made some adjustments in the implementation by reapplying locks and atomic value in replacement of channels to optimize the performance.
In addition, the Golang garbage collector is uncontrollable, so sometimes there are some suspicious long pauses. This greatly hurts the realtime-ness of this streaming application.
CGO
The project uses some existing Golang open-source VP8/H264 Library for media compression and Libretro for Game emulators. All of these libraries are just wrappers of C library in Go by using CGO. There are some drawbacks that you can refer to this blog post by Dave. The issues I’m facing are:
- Crash in CGO is not caught, even with Golang Recovery
- Unable to define performance bottleneck when we cannot detect granular issues under CGO
Conclusion ... Continue reading