Hello,
We have active failover cluster setup under Windows 2008. The shared storage is done over a qlogic iscsi hba to a Compellent SAN. Today we had an issue where the cluster servers were unable to talk to the shared storage. Once the issue was resolved the cluster was still off-line (not unexpected). However, when we went to bring the cluster services back online we got the following errors:
Cluster physical disk resource 'Cluster Disk 1' cannot be brought online because the associated disk could not be found. The expected signature of the disk was '3F489FCF'. If the disk was replaced or restored, in the Failover Cluster Management snap-in, you can use the Repair function (in the properties sheet for the disk) to repair the new or restored disk. If the disk will not be replaced, delete the associated disk resource.
This prevented us from bringing any of the cluster services (MSDTCand MSSQL) online. The item that I am confused about is that when we checked the disk signature through diskpart the signature of the disk was '3F489FCF', so the signature of the presented drive matched that of the "expected signature". The same goes for the other clustered service drives. After rebooting both nodes, all the clustered drives were showing up in disk manager as "offline". When bringing the clustered drives online, though the failover cluster management, it would hang at bringing the clustered MSDTC drive online. After the failover cluster wizard errored out we attmepted and were sucessfully able to bring the MSDTC shared drive online via disk management (via right clicking the drive and telling it to come online). This caused the clustered drive in the failover cluster managment to show the drive as online, however the service itself refused to start.
The only way that we were able to get everything back only was by deleting the MSDTC service and recreating it (we used the same unchanged drive for the clustered device for the MSDTC service). We also had to "recreate" the quorum partition (by recreate I mean re-setup the cluster to use node majority and the existing (unchanged) quorum drive as the witness disk). Once we did that we were able to get the mssql cluster disk/services online without any changes to the clustered mssql drive.
However, I am wondering why once the drives were available to servers again and the drive signatures that were presented matched (the drive signatures never actually changed), why were we unable to bring the resources online? Why was the failover cluster service unable to bring the drive for the MSDTC service online, when disk managment could? Why would the MSDST service itself fail to startup?
Any idea's of where to start looking?
We have active failover cluster setup under Windows 2008. The shared storage is done over a qlogic iscsi hba to a Compellent SAN. Today we had an issue where the cluster servers were unable to talk to the shared storage. Once the issue was resolved the cluster was still off-line (not unexpected). However, when we went to bring the cluster services back online we got the following errors:
Cluster physical disk resource 'Cluster Disk 1' cannot be brought online because the associated disk could not be found. The expected signature of the disk was '3F489FCF'. If the disk was replaced or restored, in the Failover Cluster Management snap-in, you can use the Repair function (in the properties sheet for the disk) to repair the new or restored disk. If the disk will not be replaced, delete the associated disk resource.
This prevented us from bringing any of the cluster services (MSDTCand MSSQL) online. The item that I am confused about is that when we checked the disk signature through diskpart the signature of the disk was '3F489FCF', so the signature of the presented drive matched that of the "expected signature". The same goes for the other clustered service drives. After rebooting both nodes, all the clustered drives were showing up in disk manager as "offline". When bringing the clustered drives online, though the failover cluster management, it would hang at bringing the clustered MSDTC drive online. After the failover cluster wizard errored out we attmepted and were sucessfully able to bring the MSDTC shared drive online via disk management (via right clicking the drive and telling it to come online). This caused the clustered drive in the failover cluster managment to show the drive as online, however the service itself refused to start.
The only way that we were able to get everything back only was by deleting the MSDTC service and recreating it (we used the same unchanged drive for the clustered device for the MSDTC service). We also had to "recreate" the quorum partition (by recreate I mean re-setup the cluster to use node majority and the existing (unchanged) quorum drive as the witness disk). Once we did that we were able to get the mssql cluster disk/services online without any changes to the clustered mssql drive.
However, I am wondering why once the drives were available to servers again and the drive signatures that were presented matched (the drive signatures never actually changed), why were we unable to bring the resources online? Why was the failover cluster service unable to bring the drive for the MSDTC service online, when disk managment could? Why would the MSDST service itself fail to startup?
Any idea's of where to start looking?