All posts tagged golang

Pion seemingly came out of nowhere to become one of the biggest and most active WebRTC communities. Pion is a Go-based set of WebRTC projects. Golang is an interesting language, but it is not among the most popular programming languages out there, so what is so special about Pion? Why are there so many developers involved in this project? 

To learn more about this project and how it came to be among the most active WebRTC organizations, I interviewed its founder – Sean Dubois. We discuss Sean’s background and how be got started in RTC, so see the interview for his background.  I really wanted to understand why he decided to build a new WebRTC project and why he continues to spend so much of his free time on it. ...  Continue reading

Software as a Service, Infrastructure as a Service, Platform as a Service, Communications Platform as a Service, Video Conferencing as a Service, but what about Gaming as a Service? There have been a few attempts at Cloud Gaming, most notably Google’s recently launched Stadia. Stadia is no stranger to WebRTC, but can others leverage WebRTC in the same way?

Thanh Nguyen set out to see if this was possible with his open source project, CloudRetro. CloudRetro is based on the popular go-based WebRTC library, pion (thanks to Sean of Pion for helping review here). In this post, Thanh gives an architectural review of how he build the project along with some of the benefits and challanges he ran into along the way.

{“editor”, “chad hart“}


Last year, Google announced Stadia, and It blew my mind. That idea is so unique and innovative. I kept questioning how it is even possible with the current state of technology. The motivation to demystify its technology spurred me to create an open-source version of Cloud Gaming. The result is fantastic. I would like to share my one year adventure working on this project in the article below.

TLDR: the short slide version with highlights

Why Cloud Gaming is the future

I believe Cloud Gaming will soon become the new generation of not only games but also other fields of computer science. Cloud Gaming is the pinnacle of the client/server model. It maximizes backend control and minimizes frontend work by putting game logic on a remote server and streaming images/audio to the client. The server handles heavy processing so the client is no longer limited by hardware constraints.

Looking at Google Stadia, it essentially allows you to play AAA games (i.e. high-end blockbuster games) on an interface like YouTube. The same methodology can be applied to other heavy offline applications like Operating System or 2D/3D graphic design, etc… so that we can run them consistently on low spec devices across many platforms.

The future of this technology: imagine running Microsoft Windows 10 on a Chrome browser?

Cloud Gaming is technically challenging

Gaming is one of the rare applications that require continuous fast user reaction. If we click a page that takes a 2-second delay once in a while, it is still acceptable. Live broadcast video streams typically run many seconds behind, but still offer acceptable usability. However, if a game is delayed frequently for 500ms, it becomes unplayable. The target is to achieve extremely low latency to ensure the gap between game input and media is as small as possible. Therefore, the traditional Video streaming approach is not applicable here.

Cloud Gaming common pattern

The Open Source CloudRetro Project

I decided to create a POC of Cloud-Gaming so that I can verify whether it is possible with these tight network restrictions. I picked Golang for my POC because it is the language I am most familiar with and it turned out to work well for many other reasons. Go is simple with fast development speed. Go channels are excellent when dealing with concurrency and stream manipulation.

The project is CloudRetro.io: Open source Web-based Cloud Gaming Service for Retro game. The goal is to bring the most comfortable gaming experience and introduce network gameplay like online multiplayer to traditional retro games.

You can reference the entire project repo here: https://github.com/giongto35/cloud-game

CloudRetro Functionality

CloudRetro used Retro games to demonstrate the power of Cloud Gaming. It enables many unique gaming experiences.

  • Portable Gaming experience
    • Instant play when the page is opened; no download, no install
    • Running on browser, mobile, so you don’t need any software to launch

    Gaming sessions can be shared across multiple devices and stored on cloud storage for next time
    Game is both streamed and playable and multiple users can join the same game:

    • Crowdplay like TwitchPlayPokemon but more real time and seamless
    • Online multiplayer for offline games without network setting. Samurai Showdown is now playable with 2 players over the network in CloudRetro


    Requirement and Tech Stack

    Below is the list of requirements I set before starting the project.

    1. Singleplayer:

    This requirement sounds not relevant and straightforward, but it’s one of my key findings that makes cloud gaming stand away from traditional streaming services. If we focus on singleplayer, we can get rid of a centralized server or CDN because we don’t need to distribute the stream session to massive users. Rather than uploading streams to an ingest server or passing packets to a centralized WebSocket server, the service streams to the user directly over a WebRTC peer connection.

    2. Low Latency media stream

    When I research about Stadia, some articles are mentioning the application of WebRTC. I figured out WebRTC is a remarkable technology and fits this cloud gaming use case nicely. WebRTC is a project that provides web browsers and mobile applications with Real-Time Communication via simple API. It enables peer communication and is optimized for media and has built-in standard codecs like VP8 and H264.

    I prioritized delivering the smoothest experience to users over keeping high-quality graphics. Some loss is acceptable in the algorithm. On Google Stadia, there is an additional step to reduce image size on the server, and image frames are rescaled to higher quality before rendering to peers.

    3. Distributed infrastructure with geographic routing.

    No matter how optimized the compression algorithm and the code is, network is still the crucial factor contributing the most to latency. The architecture needs to have a mechanism to pair the closest server to the user to reduce Round Trip Time (RTT). The architecture contains a single coordinator and multiple streaming servers distributed around the world: US West, US East, Europe, Singapore, China. All streaming servers are fully isolated. The system can adjust its distribution when a server joins or leaves the network. Hence, under high traffic, adding more servers allows horizontal scaling.

    4. Browser Compatible

    Cloud Gaming shines the best when it demands the least from users. This means being able to run on a browser. Browsers help bring the most comfortable gaming experience to users by removing software and hardware installs. Browsers also help provide cross-platform flexibility across mobile and desktop. Fortunately, WebRTC has excellent support across different browsers.

    5. Clear separation of Game interface and service

    I see the cloud gaming service as a platform. One should be able to plug in any to the platform. Currently, I integrated LibRetro with the Cloud Gaming service because LibRetro offers a beautiful gaming emulator interface for retro games like SNES, GBA, PS.

    6. Room based mechanism for multiplayer, crowd play and deep-link to the game

    CloudRetro enables many novel gameplays like CrowdPlay and Online MultiPlayer for retro games. If multiple users open the same deep-link on different machines, they will see the same running game as a video stream and even be able to join the game as any player.

    Moreover, Game states are stored on cloud storage. This lets users continue their game at any time on any different device.

    7. Horizontal scaling

    As every SAAS nowadays, it must be designed to be horizontally scalable. The coordinator-worker design enables adding more workers to serve more traffic.

    8. Cloud Agnostic

    CloudRetro infrastructure is hosted on various cloud providers (Digital Ocean, Alibaba, custom provider) to target different regions. I dockerize the infrastructure and configure network settings through bash script to avoid dependency on any one cloud provider. Combining this with WebRTC’s NAT traversal, we can gain the flexibility to deploy CloudRetro on any cloud platform and even on any user’s machines.

    Architectural design

    Worker: (or streaming server as referred above) spawns games, runs encoding pipeline, and streams the encoded media to users. Worker instances are distributed around the world, and each worker can handle multiple user sessions concurrently.

    Coordinator: in charge of pairing the new user with the most appropriate worker for streaming. The coordinator interacts with workers over WebSocket.

    Game state storage: central remote storage for all game states. This storage enables some essential functionalities such as remote saving/loading.

    User flow

    When a new user opens CloudRetro at steps 1 and 2 shown in the image below, the coordinator is requested for the frontend page along with the list of available workers. After that at step 3, the client calculates latencies to all candidates using an HTTP ping request. This list of latencies is later sent back to the coordinator so that it can determine the most suitable worker to serve the user. At step 4 below, the game is spawned. A WebRTC stream connection is established between the user and the designated worker.

    Inside the worker

    Inside a worker, game and streaming pipelines are kept isolated and exchange information via an interface. Currently, this communication is done by in-memory transmission over Golang Channels in the same process. Further segregation is the next goal –  i.e., running the game independently in a different process.

    The main pieces are:

    • WebRTC: Client-facing component where user input comes in and the encoded media from the server goes out.
    • Game Emulator: The game component. Thanks to Libretro library, the system is capable of running the game inside the same process and internally hooking media and input flow. In-game frames are captured and sent to the encoder.
    • Image/Audio Encoder: The encoding pipeline, where it accepts media frames, encodes in the background, and outputs encoded images/audio.


    CloudRetro relies on WebRTC as the backbone, so before going into details about my implementation in Golang, the first section is dedicated to introducing WebRTC technology. It is an awesome technology that greatly helps me achieve sub-second latency streaming.


    WebRTC is designed to enable high-quality peer-to-peer connections on native mobile and browsers with simple APIs.

    NAT Traversal

    WebRTC is renowned for its NAT Traversal functionality. WebRTC is designed for peer-to-peer communication.  It aims to find the most suitable direct route avoiding NAT gateways and firewalls for peer communication via a process named ICE. As part of this process, the WebRTC APIs find your public IP Address using STUN servers and fallback to a relay server (TURN) when direct communication cannot be established.

    However, CloudRetro doesn’t fully utilize this capability. Its peer connections are not between users and users but between users and cloud servers. The server side of the model has fewer restrictions on direct communication than a typical user device. This allows doingthings like pre-opening inbound ports or using public IP addresses directly as the server is not behind NAT.

    I previously had ambitions of developing the project to become a game distribution platform for Cloud Gaming. The idea was to let game creators contribute games and streaming resources. Users would be paired with game creators’ providers directly. In this decentralized manner, CloudRetro is just a medium to connect third-party streaming resources with users, so it is more scalable when the burden of hosting does not rely on CloudRetro anymore. WebRTC NAT Traversal will play an important role when it eases the peer connection initialization on third-party streaming resources, making it effortless for the creator to join the network.

    Video Compression

    Video compression is an indispensable part of the pipeline that greatly contributes to a smooth streaming experience. Even though It is not compulsory to fully know all of VP8/H264’s video coding details, understanding its concepts helps to demystify streaming speed parameters, debug unexpected behavior, and tune the latency.

    Video Compression for a streaming service is challenging because the algorithm needs to ensure the total encoding time + network transmission + decoding time is as small as possible. In addition, the encoding process needs to be in serial order and continuous. Some traditional encoding trade-offs are not applicable – like trading long encoding time for smaller file size and decoding time or compressing without order.

    The idea of video compression is to omit non-essential bits of information while keeping an understandable level of fidelity for users. In addition to encoding individual static image frames, the algorithm made an inference for the current frame from previous and future frames, so only the difference is sent. As you see in the the Pacman example below, only the differential dots are transferred.

    Audio Compression

    Similarly, the audio compression algorithm omits data that cannot be perceived by humans. The audio codec with the best performance currently is Opus. Opus is designed to transmit audio wave over an ordered datagram protocol such as RTP (Real-time Transport Protocol). It produces lower latency than (mp3, aac) with higher quality. The delay is usually around 5~66.5 ms

    Pion, WebRTC in Golang

    Pion is an open-source project that brings WebRTC to Golang. Rather than simply wrapping the native C++ WebRTC libraries, Pion is a native Golang implementation for better performance, better Golang integration, and version control on constitutive WebRTC protocols.

    The library also provides sub-second latency streaming with many great built-ins. It has its own implementation of STUN, DTLS, SCTP, etc… and some experiments of QUIC and WebAssembly. This open-source library itself is really a good source of learning with a great document, network protocol implementations and cool examples.

    The Pion community, led by a very passionate creator, is lively and has many quality discussions about WebRTC. If you are interested in this technology, please join http://pion.ly/slack – you will learn many new things.

    Write CloudRetro in Golang

    Go Channel In Action

    Thanks to Go’s beautiful channel design, event streaming and concurrency problems are greatly simplified. As in the diagram, there are multiple components running parallely in different GoRoutines. Each component manages its own state and communicates over channels. Golang’s select statement enforces that one atomic event is processed each game tick. This means locking is unnecessary with this design. For example, when a user saves, a completed snapshot of the game state is required. This state needs to remain uninterrupted by running input until the save is complete. During each game tick, the backend can only process either save operation or input operation, so it is concurrently safe.

    Fan-in / Fan-out

    This Golang pattern perfectly matches my use-case for CrowdPlay and Multiple Player. Following this pattern, all user inputs in the same room are fanned-in to a central input channel.  Game media is then fanned-out to all users in the same room. Hence, we achieve game state sharing between multiple gaming sessions from different users.

    Synchronization between different sessions

    Downsides of Golang

    Golang isn’t perfect. Channel is slow. Compared to lock, Go channel is just a simpler way to handle concurrency and streaming events, but channel does not give the best performance. There is a complex locking logic under a channel. Hence, I made some adjustments in the implementation by reapplying locks and atomic value in replacement of channels to optimize the performance.

    In addition, the Golang garbage collector is uncontrollable, so sometimes there are some suspicious long pauses. This greatly hurts the realtime-ness of this streaming application.


    The project uses some existing Golang open-source VP8/H264 Library for media compression and Libretro for Game emulators. All of these libraries are just wrappers of C library in Go by using CGO. There are some drawbacks that you can refer to this blog post by Dave. The issues I’m facing are:

    • Crash in CGO is not caught, even with Golang Recovery
    • Unable to define performance bottleneck when we cannot detect granular issues under CGO

    Conclusion ...  Continue reading