Captain's Log

Making WebRTC eat its tail July 24, 2024

[Edit: 2024-08-07 Addinng the second visit to the wizrds and some code]

Learning New Magic

WebRTC is a weird protocol. On one hand, it has the power to connect to every device on the network and stream video, audio and data. On the other, it’s incomplete. It depends on “magic” - an unspecified signaling server that teleports connection candidates between the peers until a connection is made.

This magic was new to me so I sought advice. My quest led me to a sect of wizards that were practicing this kind of sorcery. They call themselves pions and they share a Golang WebRTC library to use in their quests. The pions can be found on #pion channel on the gophers' slack. I messaged them with a simple query: How to conjure the signaling magic?

The answer came quickly and succinctly: HTTP’s websockets.

So we did that, adopting the gorilla mux and using it to accept incoming websockets connections from clients and servers. To simplify things we assumed the peers are honest about their fingerprints (aka public keys). To prevent masquerading, both client and server peek inside received candidate to ensure the right fingerprint is used for encryption.

The Plot Thickens

It worked well until we needed to add address book management to Terminal7. Without it the user was forced to read an email and click on a one time link to verify a gate. We wanted a simpler flow where the user could click on an unverified gate and enter an OTP to verify it. For that, we needed an authenticated connection.

Tutankhamun and his tail eating snake

It was down to either authenticating the websockets or replacing it with WebRTC. We choose to replace it as having double authentication will surely lead to hell. Why complicate things when WebRTC fingerprints serve us so well?

The plan was to add an HTTP WebRTC server to peerbook complete with management commands to allow the client to verify, edit, delete, register, ping, offer and forward candidates. Some commands require a One Time Password, some require a target and some require both. To identify and authenticate peers, peerbook peeks inside the connection candidates.

We already have similar code - at the back-end (aka webexec) - and we refactored it to a stand alone library. A library with a configurable authentication back-end that can serve as the HTTP based magic for both webexec and peerbook. The protocol this HTTP server was using is simple and pre-dated the WHIP standard. It was non standard and had to be refactored.

Validation

Like all plans, it needed validation so I went back to the wizards. The chief pion liked it and suggested I look at the WHIP RFC and at ICETCP server to ensure connectivity. The github.com/pion/webrtc library already had all the needed magic:

    ICETCPListener, err := net.Listen("tcp", addr)
    tcpMux := webrtc.NewICETCPMux(nil, ICETCPListener, 8)

These two lines starts an ICETCP server on addr. Next the details of this server are added to the webrtc API:

    var settingEngine webrtc.SettingEngine
	settingEngine.SetNetworkTypes([]webrtc.NetworkType{
		webrtc.NetworkTypeTCP4,
		webrtc.NetworkTypeTCP6,
	})
    settingEngine.SetICETCPMux(tcpMux)
    api := webrtc.NewAPI(webrtc.WithSettingEngine(settingEngine))

Using api will include the ICETCP server’s address in all answers.

Enter The LLaMa

The WHIP RFC was a great read. It’s a simple protocol that defines how to establish a WebRTC connection over HTTP. It uses POST and PATCH requests to exchange connection offers and candidates and establish a connection.

As the protocol is well defined and simple it was a great opportunity to get a LLaMa involved. I’ve been doing LLaMa Driven Development for a while and I’ve been looking for a project to take it further. With the WHIP RFC I could start with a spec and get the LLaMa to write the code, tests and docs. It took a lot of trial an error but eventually I got the LLaMa to do it all. I broke it down to four steps:

  • Read the RFC and draft the API reference.
  • Code the server contract tests based on this reference.
  • Use these tests to get the LLaMa to write the server code.
  • Construct client code based on the API reference.

In the short term, it wasn’t cost effective. I invested a lot of time finding the right prompt and experimenting. For example, what verb is best for the llama? program | code | write | craft | author ? I discovered that just like the IRL llamas, they can be very stubborn. Getting the output right often felt like trying to hold a llama’s feet to fire while it keeps kicking and kicking.

Using the WHIP server (if you need one, best to start from pion’s example) Terminal7 can establish a WebRTC connection with peerbook, open a management data channel and start receiving & sending commands & information.

As for eating its tail, one of the management command peerbook now supports is offer which forwards a connection offer to a target peer. Before forwarding the offer, peerbook validates that:

  • source & target belong to the same user
  • source & target are verified

When all squares are checked, the offer is forwarded and candidates are sent back and forth until a direct connection is established.