Quantcast
Channel: High Availability (Clustering) forum
Viewing all articles
Browse latest Browse all 6672

Resource Hosting Subsystem Deadlocks - File Share Witness

$
0
0

Recently one of the SQL 2012 AlwaysOn clusters I manage that runs Windows Server 2008 R2  started experiencing problems with RHS Deadlocks on the File Share Witness resource for the cluster. When this happens the cluster triggers a failover to the other node. The cluster is running in VMware. Each node has 2 vCPUs and 10 GB of memory.

The deadlocks only seem to occur when CPU utilization is high on the active node. Typically if a large SQL restore is running, the deadlock will be triggered. There are other clusters that rely on the same File Share Witness (different shares) and they have not experienced any deadlocks. I doubt that this deadlock is related to a communication issue.

I have been searching online and cannot find a good way to troubleshoot this specific issue when dealing with a File Share Witness. Is it possible that starved CPU could be a trigger for an RHS deadlock? Has anyone got any tips or advice for digging further into this?

Here is an excerpt from the cluster.log. I can provide additional logs if they would be beneficial.

000008e8.00000184::2014/09/19-15:20:22.000 ERR   [RHS] RhsCall::DeadlockMonitor: Call ISALIVE timed out for resource 'File Share Witness'.
000008e8.00000184::2014/09/19-15:20:22.000 INFO  [RHS] Enabling RHS termination watchdog with timeout 1200000 and recovery action 3.
000008e8.00000184::2014/09/19-15:20:22.000 ERR   [RHS] Resource File Share Witness handling deadlock. Cleaning current operation and terminating RHS process.
000008e8.00000184::2014/09/19-15:20:22.000 ERR   [RHS] About to send WER report.
0000074c.00000ccc::2014/09/19-15:20:22.000 WARN  [RCM] HandleMonitorReply: FAILURENOTIFICATION for 'File Share Witness', gen(0) result 4.
0000074c.00000ccc::2014/09/19-15:20:22.000 INFO  [RCM] rcm::RcmResource::HandleMonitorReply: Resource 'File Share Witness' consecutive failure count 1.
000008e8.00000184::2014/09/19-15:20:22.100 ERR   [RHS] WER report is submitted. Result : WerReportQueued.
0000074c.00000ccc::2014/09/19-15:21:10.272 ERR   [RCM] rcm::RcmMonitor::RecoverProcess: Recovering monitor process 2280 / 0x8e8
0000074c.00000ccc::2014/09/19-15:21:10.274 INFO  [RCM] Created monitor process 2560 / 0xa00


Viewing all articles
Browse latest Browse all 6672

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>