Software as a Service, Infrastructure as a Service, Platform as a Service, Communications Platform as a Service, Video Conferencing as a Service, but what about Gaming as a Service? There have been a few attempts at Cloud Gaming, most notably Google’s recently launched Stadia. Stadia is no stranger to WebRTC, but can others leverage WebRTC in the same way?
Thanh Nguyen set out to see if this was possible with his open source project, CloudRetro. CloudRetro is based on the popular go-based WebRTC library, pion (thanks to Sean of Pion for helping review here). In this post, Thanh gives an architectural review of how he build the project along with some of the benefits and challanges he ran into along the way.
{“editor”, “chad hart“}
Introduction
Last year, Google announced Stadia, and It blew my mind. That idea is so unique and innovative. I kept questioning how it is even possible with the current state of technology. The motivation to demystify its technology spurred me to create an open-source version of Cloud Gaming. The result is fantastic. I would like to share my one year adventure working on this project in the article below.
TLDR: the short slide version with highlights
Why Cloud Gaming is the future
I believe Cloud Gaming will soon become the new generation of not only games but also other fields of computer science. Cloud Gaming is the pinnacle of the client/server model. It maximizes backend control and minimizes frontend work by putting game logic on a remote server and streaming images/audio to the client. The server handles heavy processing so the client is no longer limited by hardware constraints.
Looking at Google Stadia, it essentially allows you to play AAA games (i.e. high-end blockbuster games) on an interface like YouTube. The same methodology can be applied to other heavy offline applications like Operating System or 2D/3D graphic design, etc… so that we can run them consistently on low spec devices across many platforms.
The future of this technology: imagine running Microsoft Windows 10 on a Chrome browser?
Cloud Gaming is technically challenging
Gaming is one of the rare applications that require continuous fast user reaction. If we click a page that takes a 2-second delay once in a while, it is still acceptable. Live broadcast video streams typically run many seconds behind, but still offer acceptable usability. However, if a game is delayed frequently for 500ms, it becomes unplayable. The target is to achieve extremely low latency to ensure the gap between game input and media is as small as possible. Therefore, the traditional Video streaming approach is not applicable here.
Cloud Gaming common pattern
The Open Source CloudRetro Project
I decided to create a POC of Cloud-Gaming so that I can verify whether it is possible with these tight network restrictions. I picked Golang for my POC because it is the language I am most familiar with and it turned out to work well for many other reasons. Go is simple with fast development speed. Go channels are excellent when dealing with concurrency and stream manipulation.
The project is CloudRetro.io: Open source Web-based Cloud Gaming Service for Retro game. The goal is to bring the most comfortable gaming experience and introduce network gameplay like online multiplayer to traditional retro games.
You can reference the entire project repo here: https://github.com/giongto35/cloud-game
CloudRetro Functionality
CloudRetro used Retro games to demonstrate the power of Cloud Gaming. It enables many unique gaming experiences.
- Portable Gaming experience
- Instant play when the page is opened; no download, no install
- Running on browser, mobile, so you don’t need any software to launch
- Gaming sessions can be shared across multiple devices and stored on cloud storage for next time
- Game is both streamed and playable and multiple users can join the same game:
- Crowdplay like TwitchPlayPokemon but more real time and seamless
- Online multiplayer for offline games without network setting. Samurai Showdown is now playable with 2 players over the network in CloudRetro
Infrastructure
Requirement and Tech Stack
Below is the list of requirements I set before starting the project.
1. Singleplayer:
This requirement sounds not relevant and straightforward, but it’s one of my key findings that makes cloud gaming stand away from traditional streaming services. If we focus on singleplayer, we can get rid of a centralized server or CDN because we don’t need to distribute the stream session to massive users. Rather than uploading streams to an ingest server or passing packets to a centralized WebSocket server, the service streams to the user directly over a WebRTC peer connection.
2. Low Latency media stream
When I research about Stadia, some articles are mentioning the application of WebRTC. I figured out WebRTC is a remarkable technology and fits this cloud gaming use case nicely. WebRTC is a project that provides web browsers and mobile applications with Real-Time Communication via simple API. It enables peer communication and is optimized for media and has built-in standard codecs like VP8 and H264.
I prioritized delivering the smoothest experience to users over keeping high-quality graphics. Some loss is acceptable in the algorithm. On Google Stadia, there is an additional step to reduce image size on the server, and image frames are rescaled to higher quality before rendering to peers.
3. Distributed infrastructure with geographic routing.
No matter how optimized the compression algorithm and the code is, network is still the crucial factor contributing the most to latency. The architecture needs to have a mechanism to pair the closest server to the user to reduce Round Trip Time (RTT). The architecture contains a single coordinator and multiple streaming servers distributed around the world: US West, US East, Europe, Singapore, China. All streaming servers are fully isolated. The system can adjust its distribution when a server joins or leaves the network. Hence, under high traffic, adding more servers allows horizontal scaling.
4. Browser Compatible
Cloud Gaming shines the best when it demands the least from users. This means being able to run on a browser. Browsers help bring the most comfortable gaming experience to users by removing software and hardware installs. Browsers also help provide cross-platform flexibility across mobile and desktop. Fortunately, WebRTC has excellent support across different browsers.
5. Clear separation of Game interface and service
I see the cloud gaming service as a platform. One should be able to plug in any to the platform. Currently, I integrated LibRetro with the Cloud Gaming service because LibRetro offers a beautiful gaming emulator interface for retro games like SNES, GBA, PS.
6. Room based mechanism for multiplayer, crowd play and deep-link to the game
CloudRetro enables many novel gameplays like CrowdPlay and Online MultiPlayer for retro games. If multiple users open the same deep-link on different machines, they will see the same running game as a video stream and even be able to join the game as any player.
Moreover, Game states are stored on cloud storage. This lets users continue their game at any time on any different device.
7. Horizontal scaling
As every SAAS nowadays, it must be designed to be horizontally scalable. The coordinator-worker design enables adding more workers to serve more traffic.
8. Cloud Agnostic
CloudRetro infrastructure is hosted on various cloud providers (Digital Ocean, Alibaba, custom provider) to target different regions. I dockerize the infrastructure and configure network settings through bash script to avoid dependency on any one cloud provider. Combining this with WebRTC’s NAT traversal, we can gain the flexibility to deploy CloudRetro on any cloud platform and even on any user’s machines.
Architectural design
Worker: (or streaming server as referred above) spawns games, runs encoding pipeline, and streams the encoded media to users. Worker instances are distributed around the world, and each worker can handle multiple user sessions concurrently.
Coordinator: in charge of pairing the new user with the most appropriate worker for streaming. The coordinator interacts with workers over WebSocket.
Game state storage: central remote storage for all game states. This storage enables some essential functionalities such as remote saving/loading.
User flow
When a new user opens CloudRetro at steps 1 and 2 shown in the image below, the coordinator is requested for the frontend page along with the list of available workers. After that at step 3, the client calculates latencies to all candidates using an HTTP ping request. This list of latencies is later sent back to the coordinator so that it can determine the most suitable worker to serve the user. At step 4 below, the game is spawned. A WebRTC stream connection is established between the user and the designated worker.
Inside the worker
Inside a worker, game and streaming pipelines are kept isolated and exchange information via an interface. Currently, this communication is done by in-memory transmission over Golang Channels in the same process. Further segregation is the next goal – i.e., running the game independently in a different process.
The main pieces are:
- WebRTC: Client-facing component where user input comes in and the encoded media from the server goes out.
- Game Emulator: The game component. Thanks to Libretro library, the system is capable of running the game inside the same process and internally hooking media and input flow. In-game frames are captured and sent to the encoder.
- Image/Audio Encoder: The encoding pipeline, where it accepts media frames, encodes in the background, and outputs encoded images/audio.
Implementation
CloudRetro relies on WebRTC as the backbone, so before going into details about my implementation in Golang, the first section is dedicated to introducing WebRTC technology. It is an awesome technology that greatly helps me achieve sub-second latency streaming.
WebRTC
WebRTC is designed to enable high-quality peer-to-peer connections on native mobile and browsers with simple APIs.
NAT Traversal
WebRTC is renowned for its NAT Traversal functionality. WebRTC is designed for peer-to-peer communication. It aims to find the most suitable direct route avoiding NAT gateways and firewalls for peer communication via a process named ICE. As part of this process, the WebRTC APIs find your public IP Address using STUN servers and fallback to a relay server (TURN) when direct communication cannot be established.
However, CloudRetro doesn’t fully utilize this capability. Its peer connections are not between users and users but between users and cloud servers. The server side of the model has fewer restrictions on direct communication than a typical user device. This allows doingthings like pre-opening inbound ports or using public IP addresses directly as the server is not behind NAT.
I previously had ambitions of developing the project to become a game distribution platform for Cloud Gaming. The idea was to let game creators contribute games and streaming resources. Users would be paired with game creators’ providers directly. In this decentralized manner, CloudRetro is just a medium to connect third-party streaming resources with users, so it is more scalable when the burden of hosting does not rely on CloudRetro anymore. WebRTC NAT Traversal will play an important role when it eases the peer connection initialization on third-party streaming resources, making it effortless for the creator to join the network.
Video Compression
Video compression is an indispensable part of the pipeline that greatly contributes to a smooth streaming experience. Even though It is not compulsory to fully know all of VP8/H264’s video coding details, understanding its concepts helps to demystify streaming speed parameters, debug unexpected behavior, and tune the latency.
Video Compression for a streaming service is challenging because the algorithm needs to ensure the total encoding time + network transmission + decoding time is as small as possible. In addition, the encoding process needs to be in serial order and continuous. Some traditional encoding trade-offs are not applicable – like trading long encoding time for smaller file size and decoding time or compressing without order.
The idea of video compression is to omit non-essential bits of information while keeping an understandable level of fidelity for users. In addition to encoding individual static image frames, the algorithm made an inference for the current frame from previous and future frames, so only the difference is sent. As you see in the the Pacman example below, only the differential dots are transferred.
Audio Compression
Similarly, the audio compression algorithm omits data that cannot be perceived by humans. The audio codec with the best performance currently is Opus. Opus is designed to transmit audio wave over an ordered datagram protocol such as RTP (Real-time Transport Protocol). It produces lower latency than (mp3, aac) with higher quality. The delay is usually around 5~66.5 ms
Pion, WebRTC in Golang
Pion is an open-source project that brings WebRTC to Golang. Rather than simply wrapping the native C++ WebRTC libraries, Pion is a native Golang implementation for better performance, better Golang integration, and version control on constitutive WebRTC protocols.
The library also provides sub-second latency streaming with many great built-ins. It has its own implementation of STUN, DTLS, SCTP, etc… and some experiments of QUIC and WebAssembly. This open-source library itself is really a good source of learning with a great document, network protocol implementations and cool examples.
The Pion community, led by a very passionate creator, is lively and has many quality discussions about WebRTC. If you are interested in this technology, please join http://pion.ly/slack – you will learn many new things.
Write CloudRetro in Golang
Go Channel In Action
Thanks to Go’s beautiful channel design, event streaming and concurrency problems are greatly simplified. As in the diagram, there are multiple components running parallely in different GoRoutines. Each component manages its own state and communicates over channels. Golang’s select statement enforces that one atomic event is processed each game tick. This means locking is unnecessary with this design. For example, when a user saves, a completed snapshot of the game state is required. This state needs to remain uninterrupted by running input until the save is complete. During each game tick, the backend can only process either save operation or input operation, so it is concurrently safe.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
func (e *gameEmulator) gameUpdate() { for { select { case <-e.saveOperation: e.saveGameState() case key := <-e.input: e.updateGameState(key) case <-e.done: e.close() return } } } |
Fan-in / Fan-out
This Golang pattern perfectly matches my use-case for CrowdPlay and Multiple Player. Following this pattern, all user inputs in the same room are fanned-in to a central input channel. Game media is then fanned-out to all users in the same room. Hence, we achieve game state sharing between multiple gaming sessions from different users.
Synchronization between different sessions
Downsides of Golang
Golang isn’t perfect. Channel is slow. Compared to lock, Go channel is just a simpler way to handle concurrency and streaming events, but channel does not give the best performance. There is a complex locking logic under a channel. Hence, I made some adjustments in the implementation by reapplying locks and atomic value in replacement of channels to optimize the performance.
In addition, the Golang garbage collector is uncontrollable, so sometimes there are some suspicious long pauses. This greatly hurts the realtime-ness of this streaming application.
CGO
The project uses some existing Golang open-source VP8/H264 Library for media compression and Libretro for Game emulators. All of these libraries are just wrappers of C library in Go by using CGO. There are some drawbacks that you can refer to this blog post by Dave. The issues I’m facing are:
- Crash in CGO is not caught, even with Golang Recovery
- Unable to define performance bottleneck when we cannot detect granular issues under CGO
Conclusion
I achieved my goal of demystifying cloud gaming service and created a platform that helps me play nostalgic retro games with my friends online. This project would not have been possible without the Pion library and the support of the Pion community. I am extremely grateful for Pion and its intensive developement. The simple APIs provided by WebRTC and Pion allowed for a smooth integration. My first proof of concept was released in the same week even though I have zero knowledge about peer-to-peer (P2P) beforehand.
Even though it is straightforward to integrate, P2P streaming is indeed a very challenging field in computer science. It has to deal with the complexity of perennial network architecture like IP and NAT to create a peer session. I accumulated much valuable knowledge about network and performance optimization during the time working on this project, so I recommend everyone to try building some P2P products with WebRTC.
CloudRetro serves all the use-cases I expected from my viewpoint as a retro gamer. However, I think there are many areas in the project that I can improve, such as making the network more reliable and performant, delivering higher graphic quality games, or sharing games between users. I’m working hard on that. Please follow and support the project if you like it.
{“author”: “Thanh Nguyen”}
Lr says
A similar idea should have been implemented by linux-projects.org some months ago where webrtc to play RetroPie games running on the RaspberryPi (audio + video) in the PC browser (with local joypad). There are tutorials at https://linux-projects.org .
Ankit Gaurav says
Hi
I am Ankit,I went through your project and I want to know which cloud you are using and can this be implemented using Azure.
Jernej Jerin says
Interesting project, I especially liked the solution for selecting the worker with the lowest latency via ping check. As you were using WebRTC basically for client server communication, have you at the beginning thought about simply using WebTransport + WebCodecs? Based on my understanding they are probably the future for any low latency client/server architecture requirements and we may even see it soon being used by Google on project Stadia.
Daniel Bode says
Hello please, I am just green into running any servers. Could you please send detailed instructions as to how to install and run a cloud retro server? please this would be greatly apprectiated.