Hi Guys
I am going out of my mind.. Been struggling with this for days unable to find something that can bring me along the right path.
My cluster was powered down when starting up and that resulted in a virtual disk being stuck in an "online pending" -> "Failed" -> "Online pending" loop. And then i tries to start it on another server. So it keeps bouncing
around all 4 servers.
I have tried almost all articles i could find. When running get-storagejob i have 1 job that keeps running:
Name IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal
---- ---------------- ----------- -------- --------------- -------------- ----------
Repair True 00:01:25 Running 0 0 45097156608
It seems that every 2-3 minutes the jobs restarts. I am getting this info in the event log (Sorry for missing pics i was not allowed to post them):
EventID: 1069
Cluster resource 'Cluster Virtual Disk (HyperVDisk1)' of type 'Physical Disk' in clustered role '96fd0e69-9c2d-41c0-92e3-09bdcd126686' failed.
Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster
Manager or the Get-ClusterResource Windows PowerShell cmdlet.
EventID: 5142
Cluster Shared Volume 'HyperVdisk1' ('Cluster Virtual Disk (HyperVDisk1)') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.
EventID: 5142
Cluster Shared Volume 'HyperVdisk1' ('Cluster Virtual Disk (HyperVDisk1)') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.
EventID: 1793
Cluster physical disk resource online failed.
Physical Disk resource name: Cluster Virtual Disk (HyperVDisk1)
Device Number: 5
Device Guid: {a75e8b5d-a226-4b0e-b6d4-cde8fffa4d1b}
Error Code: 5008
Additional reason: WaitForVolumeArrivalsFailure
EventID: 1795
Cluster physical disk resource terminate encountered an error.
Physical Disk resource name: Cluster Virtual Disk (HyperVDisk1)
Device Number: 5
Device Guid: {a75e8b5d-a226-4b0e-b6d4-cde8fffa4d1b}
Error Code: 1168
What i have tried:
This article from kreelbits: storage-spaces-direct-storage-jobs-hung
Tried optimize-storagePool and repair-virtualDisk with no success
Found a great article from JTpedersen on troubleshooting-failed-virtualdisk-on-a-storage-spaces-direct-cluster
Every time i tried to run:
Remove-Clustersharedvolume -name "Cluster Virtual Disk (HyperVDisk1)"
1 time i got that the job failed because the disk was moving to another server (Not the exact wording)
The normal response is it just hangs on the command and have been doing that for +24 hours.
To me it seems that the problem is that before any commands can get a hold of the disk it restarts the storageJob og moves the disk to another server and restarts the loop.
Thanks i advance.
/Peter