Hi,
We currently have a 4-node Server 2012 R2 Cluster witch hosts among other things, a 3 node Guest Cluster running a single clustered file service.
Around once a week, the guest cluster node that is currently hosting the clustered file service will fail. It's as if the VM is blue screening. That in itself is fairly anoying and I'll be doing all the updates and checking event log for clues as to the cause.
The problem then is that whichever physical cluster node that is hosting the VM when it fails, will not unlock some of the VM's files. The Virtual machine configuration lists as Online Pending. This means that the failed VM cannot be restarted on any other cluster node. The only fix is to drain the physical host it failed on, and reboot.
Looking for suggestions on how to fix the following.
1. Crashing guest file cluster node
2. Failed VM with shared VHDX requiring Phyiscal host reboot.
Event messages for the physical host that was hosting the failed vm in order that they occured.
- Hyper-V-Worker: Event ID 18590 - 'FS-03' has encountered a fatal error. The guest operating system reported that it failed with the following error codes: ErrorCode0: 0x9E, ErrorCode1: 0x6C2A17C0, ErrorCode2: 0x3C, ErrorCode3: 0xA, ErrorCode4: 0x0. If the problem persists, contact Product Support for the guest operating system. (Virtual machine ID 36166B47-D003-4E51-AFB5-7B967A3EFD2D)
- FailoverClustering: Event ID 1069 - Cluster resource 'Virtual Machine FS-03' of type 'Virtual Machine' in clustered role 'FS-03' failed.
- Hyper-V-High-Availability: Event ID 21128 - 'Virtual Machine FS-03' failed to shutdown the virtual machine during the resource termination. The virtual machine will be forcefully stopped.
- Hyper-V-High-Availability: Event ID 21110 - 'Virtual Machine FS-03' failed to terminate.
- Hyper-V-VMMS: Event ID 20108 - The Virtual Machine Management Service failed to start the virtual machine '36166B47-D003-4E51-AFB5-7B967A3EFD2D': The group or resource is not in the correct state to perform the requested operation. (0x8007139F).
- Hyper-V-High-Availability: Event ID 21107 - 'Virtual Machine FS-03' failed to start.
- FailoverClustering: Event ID 1205 - The Cluster service failed to bring clustered role 'FS-03' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered role.