Hi
For the past few weeks we have been experiencing a strange issue with our failover cluster.
We currently have the following:
7 Nodes : Windows HyperV 2008 R2 Core Servers
SAM driven with 2 controllers, each have 2 network connections to each host.
We seem to be having an issue where random servers within the nodes are rebooting. This can happen at any time during the day, sometimes more than once a day and can easily be more than 1 host at a time or throughout the day. The actual nodes themselves do
not restart and appear to be running fine with no connection loss. The servers within the nodes reboot but do not failover to another server.
When looking through the event logs on the failover cluster manager I see this happens everytime:
Event ID: 1230
Cluster resource 'SCVMM RGVSVR031-T' (resource type '', DLL 'vmclusres.dll') either crashed or deadlocked. The Resource Hosting Subsystem (RHS) process will now attempt to terminate, and the resource will be marked to run in a separate monitor.
Event ID: 1146
The cluster resource host subsystem (RHS) stopped unexpectedly. An attempt will be made to restart it. This is usually due to a problem in a resource DLL. Please determine which resource DLL is causing the issue and report the problem to the resource vendor.
I have looked through logs on the machines that have caused the deadlock but nothing is apparent. Its never a set time or day its completely random. The servers do come back online but its a pain taking out our systems for at least 15 minutes.
Its not always the same server as its completely random but multiple servers have been logged more than once.
Really stuck what to do next or have any idea whats causing this? All nodes are fully patched with the latest server pack 1.