Hi All,
I am currently setting up 2-node SQL Server cluster but I am getting error when doing the failover test from Node2 to Node1.
Here is the quick overview of what I have so far.
1. Setup the failover cluster for both nodes, public and private network, cluster disks for Quorum, MSDTC and SQL, etc.
2. Run validation configuration before creating the cluster. Validation report completed successfully with no errors/warnings.
3. Created cluster, created MSDTC cluster and installed SQL server on both nodes.
Now I am doing some failover test on whether cluster resources will failover from Node1 to Node2 and Node2 to Node1.
Failover Test: Active Node is Node1.
1. Disable Public network on Node1.
2. Failover to Node2 -> successful
3. Enable Public network on Node1.
Problem:
After the failover to Node1, I tried to failback the resources from Node2 to Node1 by disabling the public network on Node2 (which is the active Node after the failover from Node1 to Node2) but the cluster resources won't failback to Node1.
Failback from Node2 to Node1 -> failed
1. Disable Public network on Node2.
2. Failback to Node1 -> failed
- Cluster Name and Cluster IP ->failed
- SQL cluster group (SQL name, SQL IP address, Analysis, SQL server and SQL Server Agent) ->failed
-MSDTC cluster group -> failed back successfully to Node1
3. Enable Public network on Node 2.
4. Manually online Cluster Group and SQL cluster group
I tried to Manually online the Cluster Group and SQL cluster group but it CANNOT be online unless I enable the Public network on Node2. I have checked on the cluster event log and I am getting some event ID 1077 and 1069 errors and Event ID 1069 and 1205.
Here are some of the logs on the cluster events.
Event ID 1069: The Cluster service failed to bring clustered service or application 'SQL_Group' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.
Event ID 1205: Cluster resource 'SQL IP Address 2 (db-vip)' in clustered service or application 'SQL_Group' failed.
Anyone experience the same issue before? Appreciate if someone can point me to right direction to resolve the issue.
Thanks in advance for your feedback.
BTW, failover and failback works perfectly when I try to reboot the Active node. Resources failed over successfully from Node1 to Node2 and vice versa when I reboot the server.
Thanks again.
Regards,
Ivan