Friday, May 31, 2013

FreePBX/Asterisk Failover using IAX2 and bandwidth.com

I have a site that REQUIRES zero dropped calls.  These calls are sales, and bring in revenue. If the office cannot be contacted, not only could we lose that referral, but the referer may not bother to send more sales our way.
I am offsite.
In fact, I'm out of state. Too far to quickly physically resolve an issue.


I've been using Asterisk for almost 10 years, and it just keeps getting better.  I used it to failover a dieing ACD via T1 crossover cable, and now I've had the opportunity to do it with a pure Asterisk solution.

Detail -
Bandwidth.com provides SIP trunks for $25/month/line.  They use SRV records to provide high availability.  SRV records work like MX records.  Higher priority hosts get traffic first.  My site has an afterhours answering service, so the first step was to create an Amazon Cloud instance of FreePBX and two SRV records - one record for the office Asterisk machine, and the second for the Amazon Cloud one.  The Amazon one only forwarded to the answering service.
This ensured us that no matter what occurred, all calls would be answered.

Unfortunately after a bouncing circuit caused complete havoc with our phones - calls were dropping /cutting in and out, and I wasn't able to remotely access the PBX to shut the trunk down and force fail over - we realized the Amazon Cloud solution wasn't enough.  These referrals are time sensitive.  If we don't have a way to reach our email to retrieve the answering service notification, and call the customer in a very short amount of time, then the sale is still lost.  The only real solution is 100% uptime during business hours.

Going All Out
I procured a CradlePoint MBR1400 firewall/router with Verizon 4G modem and static IP service from Verizon.  The CradlePoint has the capability of doing failover between interfaces. For example, if your cable provider goes out, it can automatically switch to the 4G service.  That's great for web browsing, but doesn't work so well for SIP trunks.  Asterisk has to know it's external IP Address, and I'm not sure if STUN servers work for SIP trunks.  If the external IP is incorrect in Asterisk, we lose the audio stream.  Therefore I hard code it.  In addition, using the 4G modem allows us to be sure a fibre cut in the street doesn't affect us.

Once the CradlePoint was installed and tested on the PBX VLAN, I installed a 2nd instance of FreePBX on the same VLAN, and added my bandwidth.com trunks.
The BackupPBX uses the CradlePoint as the default route, and the NAT settings are set to the Verizon static IP.  
Then I created an IAX2 trunk between the primary and failover PBXs:

On the Backup PBX -
Trunk Name:  PrimaryPBX
PEER Details:
username=BackupPBX
secret=password
host=172.16.200.12
type=friend
context=from-internal
qualify=yes
qualifyfreqok=25000
transfer=no
trunk=yes
forceencryption=yes
encryption=yes
auth=md5

On the Primary PBX -
Trunk Name: BackupPBX
PEER Details:
username=PrimaryPBX
secret=password
host=172.16.200.18
type=friend
context=from-pstn
qualify=yes
qualifyfreqok=25000
transfer=no
trunk=yes
forceencryption=yes
encryption=yes
auth=md5

Note the context on the BackupPBX Trunk, from-pstn.  This is because we want to push pstn sourced calls from the Backup into the Primary.
Also, if you use a dial prefix, make sure you add that to the Trunk settings on the Primary.  Likely, you may strip a '9' from the beginning of the dial string normally, so you would need to add it back in when sending the call via the backup PBX.  

Then on the backup, create an IncomingRoute with the destination 'Trunk' 'PrimaryPBX' and create an OutgoingRoute that matches your Outgoing Route settings on the Primary.

On the Primary, add the BackupPBX Trunk as the last Trunk in your Outgoing Route.

Now we have a complete backup.  If the Internet were to go down, the SRV record will take care of incoming call fail over, and the BackupPBX Trunk/route will take care of outgoing call fail over.  Completely seamless to the user.
Obviously this doesn't cover a power outage, but due to the nature of the business, most of these sales are local and if there is a major power issue that would exceed the UPS times, then nobody is making those calls anyways.  The Amazon Cloud PBX will be shutdown, and I will run a '3rd tier backup' PBX in a VM on my local servers for the answering service.

No comments: