One lesson that we’ve learned as systems engineers is that while hardware is important, it’s difficult to rely on long-term. As a SaaS company with all software engineers, we know that software is our specialty, not hardware. So we’ve got to play to our strengths.
Redundancy & Extra Capacity
The simple answer to hardware unreliability is redundancy. This means when (not if) a machine fails, it needs to have a backup. We take this one step further—every backup has a backup wherever possible. But backups are useless if they don’t work as intended in an emergency. That’s why our backups are regularly tested in software. For example, fake calls are sent through our infrastructure and failures are ‘simulated’ to make sure all components act appropriately.
Do it in Software
Another way we implement this engineering value is through specialized software over generic hardware. Take for example the SBC (session border controller). It’s a key machine that negotiates signaling during call setup. It’s much better to have 100 SBCs each handling 100 calls than 1 SBC handling 10,000 calls. Here’s why:
- Bug fixes and new features can be pushed easily
- When the hardware fails, only a small number of users are affected
Reuse Extra Capacity for Maintenance
Finally, there’s one more way we implement this value. At Switch we’ve committed to a 24/7, no downtime system. Not even for maintenance. But maintenance (pushing new code to servers) still needs to happen. So how does that work?
The key is relying on backups and reusing extra capacity. When we’re ready to push new code, we first identify a subset of servers and drain them. Think of it as a bathtub that you drain by pulling the stopper. Active calls on a server progress just fine, but new calls are not placed there. Then we wait for existing calls to end naturally. After a few hours, the server is completely empty and ready to update with new code.
This draining process allows system engineers to push new code during normal working hours, when others are available to assist. This schedule is much better than a crazy time like 4 am Sunday morning, and then praying that things work fine on Monday.
Keeping it Real
Many business strategy articles say generic things like ‘play to your strengths’ and ‘stay focused’. But what’s really interesting is how those values are implemented in real life.
One value our engineering team has taken to heart is Software over Hardware. From this starting point, we’ve integrated cool things like multiple hardware redundancies, specialized software on generic hardware, and reusing extra capacity for maintenance.
For a SaaS (software as a service) company like us, values like these are not just words, they’re key to stability and good development.