Quantcast
Channel: High Availability (Clustering) forum
Viewing all 6672 articles
Browse latest View live

How to clustering Windows Server 2016 two different types hardware Dell vs Lenovo server

$
0
0

I have a question about Clustering between two different hardware companies.

I have a Lenvo x3650M 5 5462 server running Windows Server 2016.

Now I have another server, the Dell R740, which also runs Windows Server 2016.

My question is whether to run Windows Server 2016 clustering on Lenovo and Dell OS servers 2016.

Thanks for technical advice



Need help extending a clustered shared volume

$
0
0


Hello everyone,

I am new to Clustered share volumes within server 2012 r2. I am trying to expand or create a new volume.

I have tried to use diskpart to expand the V$ but I keep getting the error that there is not space available to expand.

This volume is on a 12TB SAN. I can see 1.2TB are available.

Does anyone know what I am missing? I don't know why I can see it available on the machine, but not within diskpart.


Windows Fileshare witness is not accessible | After patching

$
0
0

Hi Experts,

Our windows team have applied patches on two nodes of a cluster.Post patch,file share witness is accessible from one server and from other it is not accessible.Hence 

After deep dive,we could see below extra security patches has been applied on server where file share witness is not accessible.

KB3161949
KB3172729
KB3173424
KB3175024
KB4338824
KB4499165
KB4503290



we are not sure what patch is creating this problem as we don't see any official MS doc on this .Please let us know if you have any information on this matter.

Also advise,if there is any forum to check on bug details quickly.

Many thanks in advance ! 

Regards,
Naren poosa

Problem running Update-ClusterFunctionalLevel on Server 2019

$
0
0

Hi

I have in-place upgrade a 2 node SQL cluster (from Server 2016 Std. to Server 2019 Std.). The whole process worked as expected.

Now I want to run Update-ClusterFunctionalLevel, but it is returning the following error:

Update-ClusterFunctionalLevel : You do not have administrative privileges on the cluster. Contact your network administ
rator to request access.
    Access is denied
At line:1 char:1+ Update-ClusterFunctionalLevel+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~+ CategoryInfo          : AuthenticationError: (:) [Update-ClusterFunctionalLevel], ClusterCmdletException+ FullyQualifiedErrorId : ClusterAccessDenied,Microsoft.FailoverClusters.PowerShell.UpdateClusterFunctionalLevelCo
   mmand


I the Microsoft-Windows-FailoverClustering/Diagnostic eventlog it gives me the following error:

EventID: 2051

Description: [CORE] mscs::ClusterCore::VersionUpgradePhaseTwo: (5)' because of 'Gum handler completed as failed'

I think all permissions are correct, but I can't find the root cause, can you please help me?



Failover Clustering Task Scheduler Survey

netft.sys is the cause for the bugchk blue screen on the server Windows 2008 R2 Datacenter

$
0
0

Hi

we have the server geting rebooted by a bugchk error for netft.sysPlease let me know if we have any fix for this issue. i am not sure wht is causing the issue on the server

the server is windows 2008 R2 Datacenter and it is on the HyperV cluster

Thanks in advance

Some cluster networks with unavailable status

$
0
0
Hello. When we "mounted" the Cluster Failover with Windows Server 2012 R2, and all Networks were "Up", however we realized that we were not able to do Live Migration, and we checked into the Cluster Networks part and saw several interfaces with Status of "Not available". However, when we test access to these interfaces, they are normal and accessible. We have already checked Anti-Virus and Firewall on all Cluster servers (Nodes), and there is no restriction on Anti-Virus and Firewall is disabled.

Print attached.

NOTE: I already did what is on http://blog.mpecsinc.ca/2010/03/nic-binding-order-on-server-core-error.html

NOTE 2: This is only happening on some Interfaces of "Cluster Network 3", "Cluster Network 2" and "Cluster Network 1", all interfaces are "Up"

Guest file server cluster constant crashes

$
0
0

Hi

I have make working a guest file server cluster with Windows Server 2019. the cluster crash constantly, being very slow and finally crashing all my hypervisors servers....

Hypervisor infrastructure:

  • 3 hosts windows server 2019 LTSB datacenter
  • iSCSI Storage 10 Gb with 11 LUNs
  • cluster valid for all tests

Guest file server cluster, 2 VM with the same config:

  • VM 2nd generation with 2019 LTSB Server
  • 4 virtual UC
  • 8GB of non-dynamic RAM
  • 1 SCSI controller
  • primary hard drive: VHDX format, SCSI Controller, ID 0
  • empty DVD drive on SCSI controller, ID 1
  • 10 VHDS disks on SCSI controller, ID 2 to 11, same ID on each node
  • 1 network card on virtual switch routing to 4 physical teamed network cards.
  • Cluster is valid for all tests except the network with one failure point for non redundancy.


after some time, the cluster become very slow, crash and make all my hypervisors crashs. the only errors returned by Hyper-V is some luns became unavalaible due to a timeout with this message:

Le volume partagé de cluster « VSATA-04 » (« VSATA-04 ») est à l’état suspendu en raison de « STATUS_IO_TIMEOUT(c00000b5) ». Toutes les opérations d’E/S seront temporairement mises en file d’attente jusqu’à ce qu’un chemin d’accès au volume soit rétabli.

I have checked every single one parameters on VM and Hyper-V config, search with each hint I was given by logs but nothing and the crashes remains....

and sorry for my poor language, english is not my main ability for speaking

Zero Downtime File Server - Would this setup work?

$
0
0

Hello everybody,

I was given the task to plan a redundant file storage environment that can compensate failure of any component without service interruption. This is a field I have little experience with, which I want to confirm that the concept I am working on actually works. I don't have the resources to build a test system available at the moment either, making this a very theoretical construct.

I want to use a Windows Failover Cluster with a Scale Out File Server role installed. Three physical servers with limited storage space for only the operating system are supposed to be the nodes of this cluster (three as to avoid using a file witness). A single SAN storage solution will provide the storage space for the file server, attached to the individual nodes via fibre channel. The SAN storage itself has all components built in redundantly, eliminating the need to provide a second storage unit and managing the synchronization of both.

The clients are expected to then connect to the file service provided by the cluster which is then (transparently) handled by any of the nodes and, in case of failure of this node (e.g. loss of power), instantly taken over by another without interruption or considerable delay.

In case it is important: The file server is supposed to host files of different applications including resources and configurations. These applications are not run on the server, but on clients. They are executed FROM the server share though, so constant and uninterrupted file provision is required, otherwise the applications will eventually crash. Executing from the server share is mandatory.

Now as I mentioned my experience with this is rather limited, and while the concept is based on what I read from MS documentation I would like to ask you for confirmation of this working or, in case it doesnt, advice on what to do differently.

Additionally, as far as I understand running a domain controller role on the same server that is running a scale out file server role is not possible or at least not recommended. Is this still valid for Server 2019 and if, is there a way to achieve the goal of zero downtime file provisioning on the same device that is running a DC or do it have to be seperate machines?

Thanks in advance!

Windows Server 2016 cluster system Failover Cluster Validation Report shows error on the CNO

$
0
0

Hi All,

I'm having an issue with my Windows Server 2016 cluster system.
it consists of 2 nodes, let say Node1 (showing as down) and Node2 (is up).

Node1 is ping-able to Node2 and vice versa, but not sure why it is showing as down.

The Fail-over Cluster Validation Report shows error only on the below CNO:

  • The cluster network name resource 'PRDSQL-CLUS01' has issues in the Active Directory. The account could have been disabled or deleted. It could also be because of a bad password. This might result in a degradation of functionality dependent on the cluster network name. Offline the cluster network name resource and run the repair action on it. 
    An error occurred while executing the test.
    The operation has failed. An error occurred while checking the state of the Active Directory object associated with the network name resource 'Cluster Name'.

    Access is denied
This is the error logged from the Failover Cluster Manager.

Event ID 1069

Cluster resource 'Cluster Name' of type 'Network Name' in clustered role 'Cluster Group' failed.Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it. Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Event ID 1688
Cluster network name resource detected that the associated computer object in Active Directory was disabled and failed in its attempt to enable it. This may impact functionality that is dependent on Cluster network name authentication.Network Name: Cluster NameOrganizational Unit: Guidance:Enable the computer object for the network name in Active Directory.

The Virtual Cluster Frontend called PRDSQL-CLUS01is reporting it is disabled in Active Directory, as per the above error.
 
I have tried:

Taking the virtual endpoint offline and running a repair, but the errors state that “File not Found” and Error Displaying Cluster Information
Create a blank role, SQL and CAU are still working, it is only the front end failover cluster virtual network name AD account (CNO) that is having the issue.

Any help would be greatly appreciated.

Thanks,


/* Server Support Specialist */

Cluster IP keep switching

$
0
0

Dear All,

I have cluster node with 2 IPs, one active and the other one is passive. When i do NSLOOKUP i get the 2 IPs, when i ping the cluster name, then its pining the passive IP not the active IP, it should ping the active IP ( passive IP not pining - request time out). I did delete both A records in DNS then it worked fine, but after a while it went back to the passive IP again. What i need is when i ping the cluster note it must ping the active IP.

Thank you 

Cluster shared storage issue

$
0
0

Hi

I have windows servers 2012 R2 cluster with 7 drive shared from SAN storage.

Now I am not able to open all 7 drives from each node as below error.

c:\ClusterStorage\Volume1 is not accessbile

the reference account is currently locked out and may not be logged on to.

Not able to rebuild cluster, issue on disks ?

$
0
0

Hi all,

I have two Windows servers 2012 r2 (DB1A and DB1B) where a failover cluster + SQL Server Avalibility Groups used to work. But something went wrong (don't really know what, maybe an aggressive GPO) and the cluster was totally dead.

When I try to rebuild it, I get this kind of warning :

List Disks To Be Validated
Physical disk ab780ec8 is visible from only one node and will not be tested. Validation requires that the disk be visible from at least two nodes. The disk is reported as visible at node: DB1A
Physical disk ab780ec0 is visible from only one node and will not be tested. Validation requires that the disk be visible from at least two nodes. The disk is reported as visible at node: DB1A
No disks were found on which to perform cluster validation tests. To correct this, review the following possible causes:
* The disks are already clustered and currently Online in the cluster. When testing a working cluster, ensure that the disks that you want to test are Offline in the cluster.
* The disks are unsuitable for clustering. Boot volumes, system volumes, disks used for paging or dump files, etc., are examples of disks unsuitable for clustering.
* Review the "List Disks" test. Ensure that the disks you want to test are unmasked, that is, your masking or zoning does not prevent access to the disks. If the disks seem to be unmasked or zoned correctly but could not be tested, try restarting the servers before running the validation tests again.
* The cluster does not use shared storage. A cluster must use a hardware solution based either on shared storage or on replication between nodes. If your solution is based on replication between nodes, you do not need to rerun Storage tests. Instead, work with the provider of your replication solution to ensure that replicated copies of the cluster configuration database can be maintained across the nodes.
* The disks are Online in the cluster and are in maintenance mode.
No disks were found on which to perform cluster validation tests.

and when I open the Failover Cluster Manager, I can see the two nodes but can't see anything on the Roles folder, nor Disks.

Of course, SQL Server Availibility Groups is not possible :


The local node is not part of quorum and is therefore unable to process this operation. This may be due to one of the following reasons:
•   The local node is not able to communicate with the WSFC cluster.
•   No quorum set across the WSFC cluster.

I'm a bit lost. It would be great if someone could help.

Live Migration and WorkGroup Cluster on windows 2019

$
0
0

Hi ,

I found the following document about live migration and work group cluster on Windows 2016.

https://techcommunity.microsoft.com/t5/Failover-Clustering/Workgroup-and-Multi-domain-clusters-in-Windows-Server-2016/ba-p/372059

I understand Live migration is not support, and support quick migration. Is it same on windows 2019? or any plans about it ?


Drive on all nodes in SQL Availability Group "Formatted" at the same time (Cluster on Windows 2016 standard)

$
0
0

We have a 2 node SQL Availability Group on a Windows 2016 Std Cluster.

SQL Server reported the databases suspect after the data drives on both servers appeared to have been formatted.

On one of the servers we found the following events:

Event ID 7036 on 7/26/2019 at 9:37:55AM

Event ID 98 on 7/26/2019 at 9:38:12AM

Event ID 98 on 7/26/2019 at 9:38:13AM

These appear to indicate that the drive was formatted.

We have tested and found that using the Powershell "Format-Volume" command (Run locally or remotely) against one server causes the same drive on both nodes in the Cluster/AG to be formatted.

One possible cause is a server build script has been run with incorrect server details and we are investigating this possibility.

My questions are:

Has anyone experienced drives being "Formatted" simultaneously across nodes in a Clustered SQL AG?

Is the formatting of drives on an Availability Group supposed to affect all nodes? I've not found documentation to explain this.


How to automate actions based Cluster Validation Test results?

$
0
0

In windows clustering you can run a "Cluster Validation Report" either from the Cluster Administration Console or from PowerShell using Test-Cluster.

However, the output is an .htm file, which isn't really super helpful compared to getting a list of True/False values like you would expect from a "proper" PowerShell cmdlet 😉

So, my question is whether anyone knows of a way to pass the results from Test-Cluster on, so I can build something that can fix the settings that failed?
Or do I really only have a choice between inventing the wheel by creating a bunch of tests myself, or manually reading a report?

I find it hard to really believe that this is something that hasn't been automated yet?

I have been googling fairly hard, but haven't been able to find any tooling around this already.
(I did suggest fixing our build pipeline so we could have a success-rate higher than 15% on new clusters, but apparently that's not popular ¯\_(ツ)_/¯)

ps. currently I'm looking at whether I can parse the htm file that is output, but meh -__-

Can I migrate a VM from another node to current active node?

$
0
0

Hi

Is it possible to migrate a VM from another node in cluster to currently active node of hyper V Cluster?

Any clue is highly appreciated.

Thanks

Problem with virtual disk on 4 node cluster.

$
0
0
Hi Guys



I am going out of my mind.. Been struggling with this for days unable to find something that can bring me along the right path.

My cluster was powered down when starting up and that resulted in a virtual disk being stuck in an "online pending" -> "Failed" -> "Online pending" loop. And then i tries to start it on another server. So it keeps bouncing around all 4 servers.



I have tried almost all articles i could find. When running get-storagejob i have 1 job that keeps running:

Name   IsBackgroundTask ElapsedTime JobState PercentComplete BytesProcessed BytesTotal
----   ---------------- ----------- -------- --------------- -------------- ----------
Repair True             00:01:25    Running  0               0              45097156608



It seems that every 2-3 minutes the jobs restarts. I am getting this info in the event log (Sorry for missing pics i was not allowed to post them):

EventID: 1069

Cluster resource 'Cluster Virtual Disk (HyperVDisk1)' of type 'Physical Disk' in clustered role '96fd0e69-9c2d-41c0-92e3-09bdcd126686' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.



EventID: 5142

Cluster Shared Volume 'HyperVdisk1' ('Cluster Virtual Disk (HyperVDisk1)') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.



EventID: 5142

Cluster Shared Volume 'HyperVdisk1' ('Cluster Virtual Disk (HyperVDisk1)') is no longer accessible from this cluster node because of error '(1460)'. Please troubleshoot this node's connectivity to the storage device and network connectivity.



EventID: 1793

Cluster physical disk resource online failed.

Physical Disk resource name: Cluster Virtual Disk (HyperVDisk1)
Device Number: 5
Device Guid: {a75e8b5d-a226-4b0e-b6d4-cde8fffa4d1b}
Error Code: 5008
Additional reason: WaitForVolumeArrivalsFailure



EventID: 1795

Cluster physical disk resource terminate encountered an error.

Physical Disk resource name: Cluster Virtual Disk (HyperVDisk1)
Device Number: 5
Device Guid: {a75e8b5d-a226-4b0e-b6d4-cde8fffa4d1b}
Error Code: 1168



What i have tried:

This article from kreelbits: storage-spaces-direct-storage-jobs-hung



Tried optimize-storagePool and repair-virtualDisk with no success 



Found a great article from JTpedersen on troubleshooting-failed-virtualdisk-on-a-storage-spaces-direct-cluster



Every time i tried to run: 
Remove-Clustersharedvolume -name "Cluster Virtual Disk (HyperVDisk1)"

1 time i got that the job failed because the disk was moving to another server (Not the exact wording)

The normal response is it just hangs on the command and have been doing that for +24 hours.



To me it seems that the problem is that before any commands can get a hold of the disk it restarts the storageJob og moves the disk to another server and restarts the loop.



Thanks i advance.



/Peter





WMI Equivalent of powershell Start-ClusterResource and Move-ClusterVirtualMachineRole

$
0
0

Hi,

I want to use these two commands. 1) Start-ClusterResource and 2) Move-ClusterVirtualMachineRole for first starting the VM and then moving to the active host.

If I move VM without starting it, it does not move. so first I start it and the move it.. It works but

How can i do this using WMI? which are their's WMI equivalents?

Please help


Get-Volume returns all volumes within Windows Failover Cluster instead of just local

$
0
0
Hello all.

This is my first entry in the forums, so apologies if I miss something or have this in the wrong place.

I am using the Get-Volume command in PowerShell to return all the volumes located on the server I am running it from.

However, our servers are members of Windows Failover Clusters.

On one of our clusters it does what I would expect. We get a list of all the volumes on this particular node. On the other cluster we get a list of all volumes within the cluster.

Does anyone know of any setting in the windows failover cluster (or anywhere else) that could explain the difference in behavior.



In addition if I try to create a new volume using New-Volume (PowerShell) in the cluster that behaves as expected it works without issue.

If I try to create a New-Volume using New-Volume (PowerShell) in the cluster that shows all volumes I get the below error:



Failover clustering could not be enabled for this storage object.
Activity ID: {<blanked>}
    + CategoryInfo          : NotSpecified: (:) [New-Volume], CimException
    + FullyQualifiedErrorId : StorageWMI 46008,Microsoft.Management.Infrastructure.CimCmdlets.InvokeCimMethodCommand,New-Volume
    + PSComputerName        : <blanked>




Any help on this would be greatly appreciated.

Thank you.


Viewing all 6672 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>