Monday, January 14, 2008

ifconfig vs arping

Todays technical oddity comes to us as part of the Solaris Zones cluster system that I wrote. Its hardcore and I rolled it myself.

We use service IP addresses (i.e. one IP address for each service we provide) and when we move the service IP address we need to have the router find out that it has moved. That's easy with arping. We just send out an unsolicited ARP reply to the router to update the ARP table entry for the service IP address.

Each Zone is doing a particular service and we programmatically create and destroy zones as we move them around to particular cluster nodes.

Anyway, with the latest patches an odd thing started happening: when my script would ifconfig an IP address directly into a zone and then arping that address to the router, it would get removed from the arp table on the Solaris machine! To make matters worse, when I would do it by hand it would always work.

After some dinking around with it, Matt and I figured out that somehow the Solaris machine itself must be listening to the unsolicited ARP reply and it must have occurred in a race condition with the ifconfig command. My current theory is that the ARP table has only resolution down to a second, and things are recorded with the Epoch time.

The solution?

sleep(1);

*Sigh*

Maybe I should look in the source...

No comments: