Quantcast
Channel: High Availability (Clustering) forum
Viewing all 6672 articles
Browse latest View live

Windows 10 offline network drive file server cluster name nightmare

$
0
0

Hello,

My initial thread : https://social.technet.microsoft.com/Forums/en-US/651ed135-e72a-4371-838e-a8670c5070c2/windows-10-offline-network-drive-nightmare?forum=win10itpronetworking

To summarize  :

My problem only happens on W10 (no matter the hardware / domain joined or not / software installed / build version) AND with a network drive mapped to the cluster nameof our 2012R2 file servers (works ok with a drive mapped directly to any node of the cluster).

Here is the problem : when a workstation has a network drive mounted and is disconnected from network (unplug the cable), the OS is constantly trying to reach out for the file server instead of marking the drive as disconnected after a few attempts. The pc becomes slow and irresponsive until the network comes back. Since we have for example Word configured to save files in the network drive by default, users are unable to save a document because word is just waiting forever for the network drive to become accessible again.

Regards


Adding a new node to existing cluster

$
0
0

Hello,

I have an existing, functioning single-node SQL 2014 cluster running on Server 2012 R2 with iSCSI storage as the backend disk. I'm now at the point of needing to add a second node to this cluster.

When launching the SQL installation wizard to add a node to an existing cluster, it fails validation because it says the node is not part of a failover cluster. I thought that the wizard was going to do this for me, but ok, fair enough...I'll do that part myself.

What I'm confused about is what the proper steps are for adding a node to an existing cluster with shared iSCSI disk? If I map the existing iSCSI disks to my new node before it's part of the cluster, there's a risk of conflict/corruption because 2 nodes would be accessing the disk at the same time. However, it also appears that I can't run the full validation on the add node wizard without having the disk first mapped to the new node. I could skip validating the storage, but isn't this going to cause the cluster to soil itself if the new node comes online and the node isn't able to access the quorum disk?

It seems that I'm stuck in a catch 22, and Microsoft's documentation makes absolutely zero mention of what to do in this scenario.

Thanks for any assistance.

Gracefully/soft shutdown of Windows server 2012 R2

$
0
0

I use a Windows Hyper-V cluster mit Windows Server 2012 R2.

If I have the Windows Server not in clustered mode, then the Graceful Shutdown (soft shutdown) works well.

If I have the Windows Server in clustered mode, then the Graceful Shutdown does not work.

The Graceful Shutdown is working in clustered mode only during the first 2 hours after the last OS shutdown or reboot.This could be an authentication timeout.  

The local security policy „Shutdown: Allow system to be shut down without having to log on” has no influence on this behavior.

With VMWare ESXi6.7 Graceful Shutdown is OK.

To trigger the graceful shutdown I use CISCO UCSM or an IPMI Tool which send the soft shutdown signal via CISCO CIMC and ACPI to the Windows OS.

How can I trace (examine) the ACPI soft shutdown signal on the Windows Server side?

last update: 2018-07-02

Windows 2012 R2 Cluster Fails - RDM Disks

$
0
0

Hi Everyone,

We have a 2 node Windows 2012 R2 Failover cluster configured with Shared RDM's running on VMware. All of a sudden, the resources got failed over to the other node.

Below are the sequence of events triggered.

Ownership of cluster disk 'Cluster Disk 3' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration.

Cluster resource 'Cluster Disk 3' of type 'Physical Disk' in clustered role 'XXXX' failed.

Based on the failure policies for the resource and role, the cluster service may try to bring the resource online on this node or move the group to another node of the cluster and then restart it.  Check the resource and group state using Failover Cluster Manager or the Get-ClusterResource Windows PowerShell cmdlet.

Ownership of cluster disk 'Cluster Disk 2' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration.

Ownership of cluster disk 'Cluster Disk 4' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration.

Can someone let me know what was actually happened here. 


During a cluster failover event does a read-write domain controller need to be available

$
0
0

Hello, can someone please advise on the following

If I have a branch office with 'only' read-only domain controllers

I also have a Microsoft 2012 R2, with SQL always-on clustering setup

if the connection back to my main office (and therefore back to any read-write domain controllers), if 'offline' when a cluster failover event occurs, are any records updated in AD (including DNS records) are part of the failover event ?

If so and only read-write domain controllers are available, I assume the failover event will not occur/fail ?

Please advice, thank very much

 

Cluster with two nodes

$
0
0

Hello...

I want to configure a cluster on two servers with windows server 2012 R2...I could configure the cluster without a shared Storage? 

the storage is done on the disks of the servers and replicate the data between them.Does it work like this?

S2D disk performance problem - grinding to a halt.

$
0
0

Hi All,

I've recently built an 2016 S2D 4 node cluster and have run into major issues with disk performance:

barely getting kb/s throughput (yep kilo and a small b - dial up modem speeds for disk access)

vm's are unresponsive

multiple other issues associated with disk access to the csv's 

The hardware is all certified and as per Lenovo's most recent guidelines. Servers are ThinkSystem SR650, the networking is 100Gb/s with 2x Mellanox Connect-X4 adapters per node and 2x Lenovo NE10032 switches, 12x Intel SSD's and 2x Intel NVMe per node for the storage pool. RoCE/RDMA, DCB etc all configured as per the guidelines and verified (as far as I can diagnose). It should be absolutely flying along.  

I should point out that it was working OK (though with no thorough testing done) for approx. 1 week. The vm's (about 10 or so) were running fine and any file transfers that were performed were limited by the Gb/s connectivity to the file share source (on older equipment serviced by a 10Gb/s switch uplink and 1Gb/s NIC connections at the source). 

About 3pm yesterday I decided to configure the Cluster Aware Updating and this may or may not have been a factor. The servers were already fully patched with the exception of 2 updates: KB4284833 and a definition update for defender. These were installed and one at a time a manual reboot performed. Ever since, I've had blue screens, nodes/pools/csv's failing over and almost non-existent disk throughput. There is no other significant errors in the event logs, there have been cluster alerts as things go down - but nothing that has led to a google/bing search for a solution. The immediate thought is going to be "it was KB4284833 what done it" but I'm not certain that is the cause.

Interestingly - when doing a file copy to/from the CSV volumes there is an initial spurt of disk throughput (but no where near as fast as it should be - say up to 100MB/s but could equally be as low as 7MB/s) and then it dies off to kB/s and effectively 0. So it look like there is some sort of cache that is working to some extent and then nothing.

I've been doing a lot of research for the past 24 hours or so - no smoking guns. I did find someone with similar issues that were traced back to the power mode settings - I've since set these to High Performance (rather than the default balanced) but have seen no change (might be worth another reboot to double check this though - will do that shortly)  

Any suggestions or similar experience? 

Thanks for any help.

Event ID 80 Hyper-V-Shared-VHDX in SQL Server

$
0
0

Dears

We have VM working in hyper v 2012 R2 we facing the issue after facing backup,

Event ID 80 Hyper-V-Shared-VHDX

Error attaching to volume. Volume: \Device\HarddiskVolumeShadowCopy62. Error: The specified request is not a valid operation for the target device..
Error attaching to volume. Volume: \Device\HarddiskVolumeShadowCopy61. Error: The specified request is not a valid operation for the target device..
 This issue happing in cluster Server we are using Veritas application for backup

We keeing VM and Storage to same VM but still facing same issue ?

https://www.experts-exchange.com/questions/29003325/Error-Log-in-Microsoft-Hyper-V-Shared-VHDX-section-after-backup.html

https://forums.veeam.com/veeam-backup-replication-f2/failed-to-invoke-func-t44712.html

We need solve this issue no impact from backup but we want to know why this happing after backup only in cluster

Regards



Cluster Client access point and generic application resources disappeared

$
0
0

We have a Windows 2008 R2 cluster hosting some generic application like Apache services and some opentext applications. It is node and disk majority with 2 nodes and a Witness disk.

Yesterday we were removing some disks which were a part of my Windows 2008 R2 cluster. While removing disks, it got hung while removing a disk which was mounted on another cluster disk(another resource in same application group). After five mins, the entire disks(46 disks) were moved to available storage and when I checked my application group, i found it was showing just two resources, one cluster disk and one Apache application- both in failed state. 

There was an access point with name and IP address and now missing along with another two generic applications which were a part of this application cluster. Also all Cluster disks which were a part of this application cluster is now moved to available storage. I have checked cluster registry key, and can find registry values for all those missing resources on resource hive. 

I am still able to ping the client access point with name and IP. But I cannot find anything on application cluster. I have tried cluster restarts, node restarts, but still shows same status. I have tried failover. Though it failed over between two nodes, the status is same. 


can i use scvmm to create a clustered VM

$
0
0

when I try to create a VM via SCVMM console, it told me I can't save VHD on cluster share volume.

so every time I create a HA VM, i need go to the Hyper-Vhost and use the failover cluster console. it's not convenience..

can i use scvmm console to create a clustered VM? 

Thanks

SQL cluster (VM) on VM failover cluster

$
0
0
I have 2 SQL server(VMs), and it's clustered. the my VMs is on a Hyper-V failover cluster.
so I want to know when I move the VMs and select "Best Possible Node"
or use "optimize Host" (on SCVMM console)

if there has a chance that 2 sql servers will move to same node??

Thanks,

2008R2 sql cluster migration to server 2012r2

$
0
0

I have a two node sql 2008r2 cluster on server 2008r2.  The data currently sits on an equalogic san.

I will be replicating the data from equalogic to compellent san for use on the new hardware.

the new cluster will be server 2012r2 and the sql version will not change.

Can someone point me in the direction of a document or outline the steps to accomplish this.

Windows failover Cluster - Active Passive

$
0
0

I have configured a Two node cluster and both the servers are showing ACTIVE in the Cluster configuration. I want to configure it as ACTIVE & PASSIVE. What should be done to achieve this ? 

S2D IO TIMEOUT when rebooting node

$
0
0

I am building a 6 Node cluster, 12 6TB drives, 2 4TB Intel p4600 PCIe NVME drives - Xeon Plat 8168/768GB Ram, LSI9008 HBA.

The cluster passes all tests, switches are properly configured and the cluster works well, exceeding 1.1 million IOPS with VMFleet. However, at current patch as of now (April 18 2018) I am experiencing the following scenario:

When no storage job is running, all vdisks listed as healthy and I pause a node and drain it, all is well, until the server actually is rebooted or taken offline. At that point a repair job is initiated, and IO suffers badly, and can even stop all together, causing vdisks to go in to paused state due to IO timeout. (listed as the reason in cluster events) Exacerbating this issue, when the paused node reboots and joins, it will cause the repair job to suspend, stop, then restart (it seems.. tracking this is hard was all storage commands become unresponsive while the node is joining) At this point io is guaranteed to stop on all vdisks at some point for long enough to cause problems, including causing VM reboots. The cluster was initially formed using VMM 2016. I have tried manually creating the vdisks, using single resiliency (3 way mirror), multi tier resiliency, same effect. This behavior was not observed when I did my POC testing last year. Its frankly a deal breaker and unusable, as if I cannot reboot a single node without stopping entirely my workload, I cannot deploy. I'm hoping someone has some info. I'm going to re-install with Server 2016 RTM media and keep it unpatched, and see if the problem remains. However it would be desirable to at least start the cluster at full patch. Any help appreciated. Thanks


Storage Spaces Direct Issues with May 17th Update Rollup

$
0
0

Hi,

We have had major issues updating our hyperconverged S2D cluster with the May 17th 2018 Update Rollup.
The issue occurred while rebooting each cluster node as the node was shutting down to reboot.

Our cluster pool looked to have partially failed and some virtual machines crashed, failed over and restarted each time a node was rebooted to apply the update rollup.

Firstly, some background. This is a 4 node cluster with fully validated Dell R730XD servers. Cluster validation tests are all passed with success including 'Verify Node & Disk Configuration' for SES supported config. We have also verified and validated our network configuration and switches with Dell.
We ensured no storage jobs were running and that all virtual and physical disks were healthy. File share witness was online and available during the patching.

We pause one node, then applied the update rollup, after successful installation clicked to reboot the node. As the node was shutting down we got the following events:

Event ID: 1289: Source: Microsoft-Windows-FailoverClustering.

The Cluster Service was unable to access network adapter "Microsoft Failover Cluster Virtual Miniport". Verify that other network adapters are functioning properly and check the device manager for errors associated with adapter "Microsoft Failover Cluster Virtual Miniport". If the configuration for adapter "Microsoft Virtual Miniport" has been changed, it may become necessary to reinstall the failover clustering feature on this computer.
*******************************************************

Event ID: 5395: Source: Microsoft-Windows-FailoverClustering.

Cluster is moving the group for storage pool 'Cluster Pool 1' because current node 'HYPER2' does not have optimal connectivity to the storage pool physical disks.
***************************

I noted that event ID 5395 never referred to the node that was getting patched or rebooted, it was always another node in the cluster. 

After the reboot and the node joined back into the cluster the repair jobs ran and completed successfully. When we carried out the same procedure on the other nodes the same issue occurred.

Has anyone else experienced these issues? We are tearing our hair out as Dell cannot find any issues and our customer has lost complete confidence with Storage Spaces Direct due to the contant instability with it.

Thanks,

 


Microsoft Partner



Cluster Set documentation

$
0
0

Cluster engineering has just posted a new blog regarding Cluster Sets that includes a video and links to Cluster Sets documentation.

//blogs.msdn.microsoft.com/clustering/2018/07/10/introduction-to-cluster-sets-in-windows-server-2019/



tim

Windows2008 R2 Cluster Management cannot connect to the cluster

$
0
0

Windows2008 R2 Cluster Management cannot connect to cluster.

When open the Cluster Management, the list is empty. How to process it?

When I want to connect to the cluster, it notice me 'You are not allowed to connect to the cluster, because you are not an administrator on the cluster node'.

The cluster is fine after installation until last few months.

The server's firewall is closed, file and printer share is opened.

Please help me, thanks a lot.


Storage Spaces Direct (S2D) - Poor write performance with 5 nodes with 24 Intel P3520 NVME SSDs each over 40Gb IB network

$
0
0

Need a little help with my S2D cluster which is not performing as I had expected.

Details:

5 x Supermicro SSG-2028R-NR48N servers with 2 x Xeon E5-2643v4 CPUs and 96GB RAM

Each node has 24 x Intel P3520 1.2TB NVME SSDs

The servers are connected over an Infiniband 40Gb network, RDMA is enabled and working.

All 120 SSDs are added to S2D storage pool as data disks (no cache disks). There are two 30TB CSVs configured with hybrid tiering (3TB 3-way mirror, 27TB Parity)

I know these are read intensive SSDs and that parity write performance is generally pretty bad but I was expecting slightly better numbers then I'm getting:

Tested using CrystalDiskMark and diskspd.exe

Multithreaded Read speeds: < 4GBps (seq) / 150k IOPs (4k rand)

Singlethreaded Read speeds: < 600MBps  (seq) 

Multithreaded Write speeds: < 400MBps  (seq) 

Singlethreaded Write speeds: < 200MBps (seq) / 5k IOPS (4k rand)

I did manage to up these numbers by configuring a 4GB CSV cache on the CSVs and forcing write through on the CSVs:

Max Reads: 23GBps/500K IOPs 4K IOPS, Max Writes:2GBps/150K 4KIOPS

That high read performance is due to the CSV cache which uses memory. Write performance is still pretty bad though. In fact it's only slight better than the performance I would get for a single one of these NVME drives. I was expecting much better performance from 120 of them!

I suspect that the issue here is that Storage Spaces is not recognising that these disks have PLP protection which you can see here:

Get-storagepool "*S2D*" | Get-physicaldisk |Get-StorageAdvancedProperty

FriendlyName          SerialNumber       IsPowerProtected IsDeviceCacheEnabled
------------          ------------       ---------------- --------------------                   
NVMe INTEL SSDPE2MX01 CVPF7165003Y1P2NGN            False                     
WARNING: Retrieving IsDeviceCacheEnabled failed with ErrorCode 1.
NVMe INTEL SSDPE2MX01 CVPF717000JR1P2NGN            False                     
WARNING: Retrieving IsDeviceCacheEnabled failed with ErrorCode 1.
NVMe INTEL SSDPE2MX01 CVPF7254009B1P2NGN            False                     
WARNING: Retrieving IsDeviceCacheEnabled failed with ErrorCode 1.

Any help with this issue would be appreciated.

Thanks.

CPU Usage: 50% or above on all Server 2016 Hyper-V Clustered nodes with 3 VM's

$
0
0

Hello,

We have a 3-node Server 2016 cluster, based on Dell R730 servers.All Hosts show a CPU Usage of 50% or above. TaskManager shows the correct value (around 0%)

Only 3 small test-vm's are running on those hosts. Has anyone seen this behavior?

We would like resolve this.

Regards,

Jan


Jan

Is a 2-Node+Cloud Witness S2D Cluster Less Reilable than a 3-Node+Cloud Witness S2D Cluster?

$
0
0

I work for a company that can easily run all our services off one host, let alone 2 or 3, but in the past before I started working here they did purchase 3 licenses of datacenter + SA so we could without needing another license. We still would like a cluster to minimize downtime in the event of a hardware failure though so I've been looking into an S2D cluster and have and S2D lab setup.

I keep reading vague/anecdotal claims that S2D is unreliable or doesn't work right with only 2-nodes. Is there any truth to that? Microsoft documentations states that 2-node might not fail over properly without a witness or 3rd node, is that what people mean when they say 2-node is unreliable? I wouldn't consider 2-node without a witness and it doesn't make sense why anyone would?

Of course you can't read the minds of random people on the internet, but I'm not sure what they're talking about? Is a S2D 2-node+witness less reliable than 3-node+witness? Obviously 2-node can tolerate only one failure and not two, but we have offsite hyper-v replicas already.

Is there something in S2D's design that it simply doesn't work reliably with 2nodes+witness?

Viewing all 6672 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>