Wednesday, December 28, 2011

MutliSubnetFailover – the DNS conundrum

I love new technology and I love new ideas, but sometimes the coolest of those are somewhat not thought through… When Microsoft announced the support for multisubnet clustering in SQL 2012 I was cheerful, as this is a feature that can come in really handy in time like these. Unfortunately in the real world this new feature soon proved to be … let’s say … improvable.

Here’s the deal: When you have a virtual network name (VNN) in a cluster with multiple IP addresses behind it in the end what that does is automatically reset DNS entries whenever you failover the group. Now as all of you know (and if you don’t know it’s still a fact unfortunately…) client computers have this nasty thing that’s called “DNS resolver cache”, which basically means that names that have already been resolved to IP addresses will be cached locally and not sent to the DNS server again for a while. (That while is called the “Time to Live”, or TTL for short.) VERY good idea in normal life, otherwise DNS Servers would have a real problem and every request would be WAY slow… Unfortunately… VERY bad idea for failover scenarios like ours with the DB, especially if you require real high availability…

OK, now, Microsoft is not THAT stupid… in fact they have quite smart people there and those people thought about this problem too. What they did to tackle this was quite simple: Make the client driver aware of the problem so it can solve it. The way they did this was… well… Interesting at best… They changed the behavior of the VNN so it would always register all IPs in DNS, not only the active one, (ok, this was not a real change, cluster service had this feature for a while now, they just use it…) and build a change into the client to get all those IPs and automatically connect and reconnect to the active one.

Nice idea in theory… The problem starts when you begin using applications that are not aware of that idea. Why? Because what happens by default when you have multiple IPs in DNS is that the client fetches a random one (OK, not really random, but it doesn’t matter for now…) and places that one in his cache. If that one is the one that is currently inactive… Well… Let’s call it a bad day… You will just not get a connection open, and the server will not fetch a new IP because the resolver cache already has one… So… Tough luck…

OK… Enough with that… So how do you overcome this problem? Well, unfortunately there is no really good way to do it… (Unless of course all your applications can be made aware of the situation by using the latest drivers…) Here is what I would do for now as a next best thing to a solution:

  1. Stop the DNS behavior of the VNN, let it run as it would have normally. Unfortunately you can’t change this in the UI, only via commandline: cluster.exe RES <ResourceName> /PRIV RegisterAllProvidersIP=0
  2. Reduce the TTL of that DNS entry to whatever is the longest you can possibly wait for a failover. I chose 15 seconds for that, but feel free to go your own way. Luckily this setting can be changed in the DNS management UI.
  3. Still keep the MultiSubnetFailover=True parameter in the apps that support it. It doesn’t really help, but maybe you reach the point at some time where you have a homogenous system again and can use that new pattern.

There is an alternative way too that might even be better than this one. But I would strictly only recommend this if your system is rather small. (In numbers of DBs…) You could leave the SQL listener as it is and connect your MutliSubnet aware applications there. Then add a second VNN to the group that is set with minimal TTL and no RegisterAllProvidersIP flag to connect your legacy apps to. This setup will give you maximum availability and flexibility, but it requires extra work and extra IP addresses…