Life of a call part 5: the Switch telephony infrastructure

This is the fifth post in the Life of a Call series by our Cofounder and VP of Engineering, John Rector. Before starting this company, John worked at Microsoft and then at Google where he was the lead software engineer for Google Voice.

Part 1: The Historical Telephone
Part 2: PSTN and Modern Telephony
Part 3: Internet and the Separation of Data & Voice 
Part 4: Switch’s Role

Today’s blog post gives a brief overview of the backend infrastructure that powers both Switch and UberConference.

Broad overview

Within our infrastructure, there are lots of moving parts that work together to set up a single call. The diagram below boils it down to a few key components.

Definitions and a brief walkthrough of call setup:

Caller First the caller picks up the phone and dials a phone number associated with Switch or UberConference. Usually the caller is from the PSTN (just the normal old telephone network).

The signaling protocol (how to setup a call) for the PSTN is called SS7. The call request goes through AT&T, Verizon, and other large, well-known networks.
Carrier Eventually the SS7 request reaches a smaller carrier company such as Bandwidth, Broadvox, or Voxbone. These companies specialize in the interface between the PSTN (SS7, TDM, circuit switched) and the normal Internet (VoIP, SIP, packet switched).

All of our infrastructure is VoIP (Voice over IP) or internet-based. We use SIP (Session Initiation Protocol) to set up calls. None of Switch Communications’ infrastructure uses SS7 or touches the PSTN directly.
Datacenter An inbound call request eventually reaches our infrastructure, where it is placed on the right bare metal machine. This machine hosts both ends of the call. For UberConference, it hosts all participants and mixes the audio appropriately.

A single machine can hosts many different Switch calls and many different UberConference conferences all at one time.
Web Application But we have to make sure that the call is placed on the right machine. To do this, we also have a “smart” part of the infrastructure with this important algorithm.

Furthermore, we also have to handle business logic like international call credits, suspended users, call logging, voicemail, etc.

These functions are done by the web applications Switch and UberConference. Because these web applications are standard and monolithic, it’s easier to host these on Google App Engine. AppEngine is very much like AWS (Amazon Web Services) used to host many other web applications you use everyday.
Outbound Calls to Callee Once the caller is placed on the right machine, we have to let the callee (the intended recipient) know there’s someone who wants to speak to them. Our web applications then make outbound calls to callee.

There are two different types of outbound calls. First, if a person has Switch on their desktop then we send a SIP INVITE (invitation to start a call) to the application on their computer.

Second, if a person has a forwarding phone listed in their account settings then we’ll also make an outbound call there. Since the forwarding number is a PSTN number, this mirrors the process of an inbound call with carriers, SS7, etc.
Call Accepted, Audio Bridged For the normal use case in Switch, we’ll dial out to the person’s client on their computer and any forwarding phone numbers they listed. The first call the callee accepts is bridged, and the other calls are dropped.

The calls are bridged on the right machine and media (voice conversations!) start transmitting between both sides. The web applications are also notified that the call has started. This allows the web app to log the call and implement more features.

Design Decisions

When designing our infrastructure, we’ve made a few key decisions to make our backend more reliable and easier to maintain.

Infrastructure is simple We’ve designed the machines inside our data centers to be very simple, even ‘dumb’. This means they make few, if any, complicated decisions by themselves. Complex logic is restricted to the web applications where it is easier to develop and test.
Scales easily By making our worker machines simple, scaling our infrastructure to match traffic is simple too. We just add or remove machines as needed. Also, should a machine fail because of hardware problems then it is easily taken out and another machine picks up the slack.
Cloud whenever possible As said before, we use both our own data centers that we personally manage and Google App Engine, which is hosted and managed by Google.

When to use App Engine versus when to use our own machines is an important decision. Usually this comes down to performance. For example, to reduce latency (delay) on active calls we want to know exactly where the machines that bridge calls are. This is why we own those machines, rather than App Engine which only gives us a generic location.
APIs for control, REST for alerts/callbacks Interactions between components in our systems are abstracted into APIs, or Application Program Interfaces. APIs allow each component to develop internally without breaking other systems.

Our APIs are also in HTTP REST format, which means they are very similar to the HTTP requests you make during normal web browsing. Making and sending such requests are very simple and standard.

This marks the end of our Life of a Call series—we hope you liked it! Get telephony, product, and news stories by subscribing to the blog below. Ready to take Switch for a spin?

Dialpad is the phone system designed for the way you work