My Guide to How Live Streaming Media in the Browser Will Work in My Project

The Challenge

With this guide, I am to explain to myself the higher level flow of how live streaming media, or audio in my case, is being sent by my project’s website from a speaker to a listener.

What is Live Streaming Media?

Live Streaming Media looks like …

  • Watching the Kavanaugh court proceedings live on the New York Times
  • Listening to a live Carnegie Hall concert on the radio station’s WQXR’s website
  • or watching a live Youtube feed of cats playing or sleeping…
  • [I did all of these things recently]

You are enjoying live streaming media in the browser. The technology that allows all of this is WebRTC… at least I think. Perhaps in some cases they aren’t using WebRTC?

What is WebRTC?

Coming. More here. https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API/Signaling_and_video_calling

And here. https://www.html5rocks.com/en/tutorials/webrtc/basics/

And here. https://codelabs.developers.google.com/codelabs/webrtc-web/#0

What kind of server do I need to use WebRTC?

From what I can tell, I need a javascript file (called server.js) that describes the behavior I want my remote server to carry out. It seems I’ll be using the word “server” a lot, but sometimes I’ll be referring to my server.js file, and other times to my remote server that is hosting all my files.

Based on further research, it looks like people tend to use the words “Signaling Server” to describe the type of behavior I want from my javascript file (?).

A signaling server takes care of a lot of things when one user “calls” another, asking for a live media stream. (The live media stream can be video, audio or both.)

What does a Signaling Server do?

A Signaling Server fulfills an essential function that the WebRTC API does not handle: getting two users connected in the first place.

This type of server lets two users transfer initial introductory messages before they have access to a WebRTC peer connection. This type of communication is called signaling.

Do I have to build it myself?

Yes. The people who wrote the WebRTC APIs (there are three) decided to not handle initial connections, allowing others to set this part up exactly as you please.

Do I really have to build it myself?

Not literally. Actually, most examples I’ve found use the socket.io library to handle signaling for them. As will I!

What are different options for setting up my Signaling Server?

@Mauz, who I am forking super helpful code from his Web RTC Experiments github repos, has a helpful page about all of this.

He suggests using either Socket.io/WebSockets (what I’ll use, like him), XMPP/SIP, PHP/MySQL, WebSync/SignalR, PeerServer/SignalMaster or other gateway.

If people want, you can bypass socket.io and use your own signaling gateway/implementation.

What kind of messages does my Signaling Server need to send?

The bread and butter: the list below is taken from an article by HTML5Rocks.

  • Session messages to begin or end communication.
  • Error messages.
  • Media settings, such as video or audio, the formats to encode media with, bandwidth.
  • Key data for secure connections (how does this work?).
  • Network data, such as a host’s IP address and port as seen by the outside world (what does this mean?)

Once User 1 calls another User, the server needs to pass the offer from User 1, pass the answer from User 2, pass the ICE candidates (more soon) between them, and set up the WebRTC connection.

Below is a nice infographic from Tutorials Point.

Building the Server
A diagram showing communication patterns between User 1, a Server and User 2. Borrowed from Tutorials Point website.

What is RTCPeerConnection?

Not sure yet, do I need to know?

What are STUN and TURN Servers?

Not sure yet, but I might need to know… these servers apparently help users who can’t connect by the most direct route possible, and instead need to traverse NAT gateways and firewalls because direct connection fails. Are these reasons why my friends with AT&T couldn’t connect without local wifi outdoors in Washington Square Park?

It looks like the WebRTC APIs use STUN servers to get the IP address of your computer (why?) and TURN servers to function as relay servers in case peer-to-peer servers fail.

How to Actually Build X

Steps I Need to Take

I started a guide for myself here.

References and More Reading

X Y Z

Future Topics

X Y Z