We have a cluster setup using 2 machines connected to an HP P2000 as a shared storage.
The 2 cluster member are connected to a client facing network on one interface, a sync interface for the cluster, directly connected and another interfacer per server to the storage (HP p2000)
The quorum settings is setup as disk + network with a properly setup quorum drive.
So far, the cluster hosts MSDTC + SQL 2008 + a custom application. All servers have latest and greatest patches. All recent drivers have been installed on the 2 servers forming the cluster
The situation that I've seen so far is the following :
- Assuming all services are on node 1 and it looses it's connectivity to the client facing network (say because of switch failure).
- Service failover properly to the second node, no issues here
- Service is restored on node 1 (Cables reconnected in this simulation) and the clustering service now sees both nodes.
- I know simulate the same network outage scenario to fail the client facing network on node 2
- All services tries to flip over to the first node, but fails in doing so as the disk are still owned by Node 2.
Here are some more conditions to help out
- Neither of the nodes looses connectivity to the other components (sync and storage is never lost
- iSCSI is used for both servers and both sees the drives
- Manual flip to the server 1 does the job.
- The failure seems to happen regarless of the time lapse between the flips.