network comm failover to a cluster

March 12th, 2008 | Author: david

We have sites in South America where network connectivity is very unreliable. We have MPLS lines connecting the remote offices in those locations, and we also have Internet access from each of the offices.

When setting up our replication/mail routing architecture in these locations where connectivity is unstable we got a little creative.

First, we have our hub servers in the Home office in Hong Kong. These hub servers do all of the replication from Hub to spoke. This cuts down on the overall replication overhead on the network. The idea is that cluster cache doesn’t have to be rebuilt every time if we have only the hubs replicating to the spokes. At the end of the day, network utilization and time required to replicate is actually lower.

We also put clusters in each site. This alleviates downtime in the event that one of the servers goes down. Since we have clusters, we can set the connection document to replicate with the cluster instead of the individual servers. This provides much benefit because replication still occurs if one server is down. It also reduces the network utilization in half because replication will only occur one time, to the cluster, instead of twice to each individual server.

From the Hubs, we set one direct connection document to connect to the remote cluster member1 on MPLS. We do this for the primary servers. The primary servers in the cluster is member1, let’s say. Since we are using clustering for failover and redundancy and not to share the server’s resources, most users home servers are set to member1. Since MPLS lines have a responsible party that is providing them, and they have an SLA (Service Level Agreement), we rely mostly on the MPLS.

The direct connection from the hub to the remote cluster member2 is set to connect via Internet address. The direct connection documents have both replication and mail routing disabled. The cluster connection document requires two normal connection documents so that the hubs know how to connect to the cluster, since you only have one “Optional network address” field, but two servers.

We have two different external IP addresses for each machine, mapped through firewall/routers. One IP address connects you via MPLS, and one is the public NAT’d IP address on the firewall for the server.

RESULT: We have Domino failover for replication due to the cluster connection document, and we have network comm. failover due to the different methods that we connect to the remote server. Additionally, hub1 is set to replicate with member1 via MPLS, and member2 via Internet. Hub2 is setup to replicate with member1 via Internet, and member2 via MPLS. This means that if member1 is down and MPLS is also down for example, that replication still occurs to member2 over Internet.

In the region, we also do the same. We’ve bent our rules a little bit because of the slow network latency from the home office site in Hong Kong to the remote site, so we’ve setup replication from the smaller sites, such as Chile and Uruguay, to the larger site’s cluster over both the site to site MPLS line and Internet.

We also put all of these servers in this slow region in the same NNN, so mail does not have to route from one server in this site through the slow connections to the hub and then back down the slow connection to the 2nd remote server in this region.

So far so good and it’s working well.

Posted in Administration, Connection Docs, Failover, Mail Routing | Tags: Administration, connectiondocs, Failover, Mailrouting

Symetrik Domino Consulting

network comm failover to a cluster

Leave a Reply