On previous posts we already talked about hardware and processes that should be in place to ensure Internet access stability. This last post addresses reliability of access networks. Our goal is to describe a low cost, yet highly reliable internet access made of two DSL lines from different carriers. The setup that will be described can easily be adapted to Cable lines, which are even easier to work with.
Traffic segregation across two different lines with auto failover (aka protection switch) for both of them. Reliable internet access.
No interference between critical server/vpn traffic and the anything goes workstation traffic.
Access uptime numbers that could otherwise only be achieved with more expensive lines.
What we need
A Linux based router with 3 network interfaces (and iproute2, htb, iptables)
2 DSL lines with static IPs from different carriers
2 DSL modems (bridged mode)
We will use one interface for each DSL modem and run PPPoE on it, so the Linux machine gets the public IPs. This way the port forwarding and routing configurations are independent of the DSL device.
One of these interfaces will be used for server related traffic (selected by LAN server IPs) while another one is used for Internet access from the internal workstations. The remaining one is used for connecting the two other interfaces to the internal networks.
For the router we use a Soekris Engineering net4501 board with Pyramid Linux, but any pc/server with a regular Linux distribution will do.
Once developed, the implementation is quite simple and based upon a few scripts:
Configures traffic filtering and routing rules for the normal situation. Runs on startup.
Configures QoS on the interfaces. Runs each time a PPPoE session is (re)started
Monitors both connections and triggers the failover actions when necessary.
There is also an extra script, igw-common.sh that contains the common variables and functions. On this script the existing servers, interface names and IPs are defined.
The trickiest parts of the setup are the creation of two independent routing tables on igw-iptables.sh and the monitoring of the connection state plus the corresponding failover switch.
Routing is controlled by a small number of rules which revolve around the default routing table plus two specific routing tables created for the purpose of multihoming.
Each of the two specific routing tables is assigned to the IP of one interface, so that packets generated from that IP (eg, belonging to connections generated from the outside to that interface) are routed trough the corresponding ISP gateway.
ip rule add from $EXTIP table $EXTTABLEThe packets originated from internal server IPs are routed according to the specific table for servers.
ip route add 192.168.1.0/24 dev $INTIF src $EXTIP table $EXTTABLE
ip route add $EXTGW dev $EXTIF src $EXTIP table $EXTTABLE
ip route add default via $EXTGW table $EXTTABLE
for i in $SERVERS; doLocally generated connections (usually none in the case of a pure gateway) are routed according to the default routing table, where a default gateway is set (can be from either of the interfaces).
ip rule add from $i table $SRVTABLE
ip route add default scope global nexthop via $EXTGW
A word about protection switch
Switching from the working situation to a protection situation takes some time in this setup for several reasons. First of all, the connection is checked at IP level by pinging other machines on the Internet. One could think of monitoring the first-hop ISP router but it may happen, that it answers the ping requests while not forwarding traffic (yes, it happens). On the other hand, pinging a single Internet host is not reliable as it may be down. Furthermore pinging a group of hosts has to be done carefully as if it's done during a PPPoE session restart (which happens regularly) it may trigger a false switch. Thus there are some retries and delays involved in the monitoring process which make it slower to react. If you're looking for the carrier grade less-than-50ms switch please look somewhere else :-) as it's not possible with DSL/IP .
It should also be noted that when the server line is down (see srv_switch2protection), the corresponding DNS entries should be pointed to the other line (ideally one should have only one A record plus some CNAMES) so that it accepts connections from the outside.
This is a setup that is working on the field with excellent results. It can be tweaked as desired by editing the scripts. If all you need is traffic separation this setup also works perfectly with two lines from the same carrier. However they're likely to fail simultaneously, since they share the same physical and logical paths.
If you have any questions, just leave a comment.