Quantcast
Channel: High Availability (Clustering) forum
Viewing all 6672 articles
Browse latest View live

Nodes randomly losing communication with cluster

$
0
0

We have a 6 node production cluster.  We are on Windows Server 2008 R2 and SQL Server 2008 R2.  At any time, a node will loss communication with the cluster causing every instance on that node to failover to other nodes.  The event logs are very generic - event ids 1006 and 1335.  We disabled tcp offloading, done nic driver updates, installed various patches (KB2524478, 2552040, 2685891, 2687741, 2754804), but its still happening.  If anyone has any information that can help, please let me know.  Here is what is happening in the cluster log at the time of the disconnect.

00000950.00000b14::2013/02/20-12:37:09.511 WARN  [CHANNEL ~] failure, status WSAETIMEDOUT(10060)

00000950.00000ae4::2013/02/20-12:37:09.511 WARN  [CHANNEL ~] failure, status WSAECONNRESET(10054)

00000950.000009cc::2013/02/20-12:37:09.518 INFO  [ACCEPT] :::~3343~: Accepted inbound connection from remote endpoint:~51451~.00000950.0000133c::2013/02/20-12:37:09.518 INFO  [SV] Route local (~) to remote  (:~51451~) exists. Forwarding to alternate path.00000950.0000133c::2013/02/20-12:37:09.518 INFO  [SV] Securing route from (~) to remote  (:~51451~).

00000950.0000133c::2013/02/20-12:37:09.518 INFO  [SV] Got a new incoming stream from:~51451~

00000950.00000b14::2013/02/20-12:37:09.519 INFO  [PULLER evproddb13] Parent stream has been closed.

00000950.00000b14::2013/02/20-12:37:09.519 ERR   [NODE] Node 4: Connection to Node 7 is broken. Reason Closed(1236)' because of 'channel to remote endpoint 3343~ has failed with status WSAETIMEDOUT(10060)'

00000950.00000b14::2013/02/20-12:37:09.519 WARN  [NODE] Node 4: Initiating reconnect with n7.

00000950.00000b14::2013/02/20-12:37:09.519 INFO  [MQ-evproddb13] Pausing

00000950.00001988::2013/02/20-12:37:09.519 INFO  [Reconnector-evproddb13] Reconnector from epoch 1 to epoch 2 waited 00.000 so far.00000950.00001988::2013/02/20-12:37:09.519 INFO  [CONNECT]:~3343~ from local ~: Established connection to remote endpoint:~3343~.00000950.00001988::2013/02/20-12:37:09.519 INFO  [Reconnector-evproddb13] Successfully established a new connection.00000950.00001988::2013/02/20-12:37:09.520 INFO  [SV] Route local (:~52834~) to remote evproddb13 (~) exists. Forwarding to alternate path.00000950.00001988::2013/02/20-12:37:09.520 INFO  [SV] Securing route from (:~52834~) to remote evproddb13 (3343~).

00000950.00001988::2013/02/20-12:37:09.520 INFO  [SV] Got a new outgoing stream to evproddb13 at 3343~

00000950.00000ae4::2013/02/20-12:37:09.525 ERR   [NODE] Node 4: channel (write) to node 7 is broken. Reason Closed(1236)' because of 'channel to remote endpoint:~3343~ has failed with status WSAECONNRESET(10054)'

00000950.00000ae4::2013/02/20-12:37:09.525 WARN  [NODE] Node 4: Initiating reconnect with n7.

00000950.00000ae4::2013/02/20-12:37:09.525 INFO  [MQ-evproddb13] Pausing

00000950.00000b14::2013/02/20-12:37:09.525 INFO  [NODE] Node 4: Cancelling reconnector...

00000950.00002318::2013/02/20-12:37:09.525 INFO  [Reconnector-evproddb13] Reconnector from epoch 1 to epoch 2 waited 00.000 so far.00000950.00000b14::2013/02/20-12:37:09.525 INFO  [CONNECT] 3343~ from local 14:~0~: Established connection to remote endpoint 3343~.

00000950.00000b14::2013/02/20-12:37:09.525 INFO  [Reconnector-evproddb13] Successfully established a new connection.00000950.00000b14::2013/02/20-12:37:09.525 INFO  [SV] Route local (:~52836~) to remote evproddb13 (:~3343~) exists. Forwarding to alternate path.00000950.00000b14::2013/02/20-12:37:09.526 INFO  [SV] Securing route from (:~52836~) to remote evproddb13 (:~3343~).00000950.00000b14::2013/02/20-12:37:09.526 INFO  [SV] Got a new outgoing stream to evproddb13 at:~3343~




Continuously available storage for hyper-v

$
0
0

Please excuse any inaccuracies with my understandings as I still have limited Windows Server 2012 experience and no clustering experience.  I am trying to build a Windows Server 2012 two-node failover cluster in a test lab environment (but this configuration should also be deployable to small/mid-sized businesses) that provides continuous availability to Hyper-V virtual machines.  My goal is to use consumer-grade hardware (i.e., Supermicro 4u towers, SATA drives, onboard SATA controllers or inexpensive RAID cards, etc.) without a single point of failure (e.g. a dedicated SAN that is either iSCSI connected, or SMB shares provided externally that go offline).  The ideal situation is that all Hyper-V related files are self contained within the two nodes of the cluster such that if one node fails, the other node takes up all load without service disruption.

Based off of previous research and discussion, I have found that Starwind’s Native SAN for Hyper-V accomplishes this, but I haven’t found a clear answer if this can be accomplished or not using built in WS12 solutions.  I would like to explore JBOD Storage Spaces as well as Cluster Shared Volumes.

Proposed Setup:

Node 1:  RAID1 mirror for boot, three additional SATA hard drives for cluster storage.

Node 2:  RAID1 mirror for boot, three additional SATA hard drives for cluster storage.

Create a Windows Server 2012 Failover Cluster.

Create a storage pool with all six SATA drives across both nodes.  (all data is mirrored across six discs on two nodes)

Create a fixed discs for cluster shared volumes.  (if one node goes down, there is transparent failover for continuous data access)

Will this scenario work, even if my steps might be slightly wrong?  If not, what changes can be made in order to accomplish this?

Thank you

Cluster-Aware Updating DNS configuration

$
0
0

Hello,

I've got a simple 2 node Server 2012 cluster and have configured CAU. It actually runs the updates just fine, but something is not quite right. 

When I run Server Manager, it lists the "entity" (I don't know what to call this thing - it's not a server, but it's listed here) that was created when I enabled CAU with the name CAUhvcluxc4 with an error of "Target name resolution error." I tried adding a CNAME DNS entry for CAUhvcluxc4 to point to the name of my cluster, and that caused the error to go away, but it started generating errors in the FCM saying "dns rr set that ought not exist does exist."

I'm sure there's something simple I'm doing wrong here, but I can't figure it out.

Clustering explained 101 ?

$
0
0

I'm trying to understand a fundamental concept of clustering - is it just the DATA that's replicated (or is SHARED more technically correct?) or is it also the APPLICATION that replicated (or shared)?

I have identical servers and I'm trying to plan for disaster recovery and I'm wondering if "clustering" might be the best way to plan for failure?

I have a mission critical SQL application that I'd like to keep online at all times. If I understand clustering, it's just the data that's replicated (shared) so if the software program that accesses the SQL database is on a server that becomes unavailable...you're screwed - it doesn't matter that the DATA is protected against failure, right?

Or am I wrong? - clustering will replicate (share) the application itself so if either server becomes unavailable...the end users wouldn't know the difference and can continue to work?

Yes? No?

(Finally, I'd like to design my Exchange for continuous availability as well - is clustering a good candidate for it as well?)

Ed

Adding a second NIC failover Cluster for the SAN connection

$
0
0

Hello guys, I have a question about Hyper-v clustering subject.

I have 3 notes Hyper-v cluster, with 6 NICS.

1 NIC For Management, 192.x.x.x

2 NICs for SAN communication, 10.x.x.x. both are in the same 10.x.x.x subnet 255.0.0.0

1 NIC for the Heartbeat 172.x.x.x 255.255.0.0

2 NICs Just for VMs assignments.

All these connections are connected in a 2 switch stack. Cables are alternated accordantly to have redundancy. node1 to SAN on Nic1>switch1. node1 to SAN on Nic2> switch2. All three have the same configuraton.

Last week we had a failure on switch 1 and we lost connection on all VMs. I checked and under the Cluster manager, we are missing one NIC on each server dedicated for the SAN communication. I guess that explains why we lost connection to the VMs. The SAN never went down because it has two controllers and it was working on the second switch.

Cluster and failovers are working fine with live migrations, but as long as we have failures on any node, but when we had a power failure on the first switch we lost connection the vms. That tells me that the connection from the host to the vhd on the SAN was lost. All three nodes lost connection to the SAN.

My question is, how can I add or make the second NIC on each NODE to join the group of “SAN Connect” group. In this group, only one NIC with 10.x.x.x shows in there. Do you guys know if I need to assign another subnet to the second NIC in order to make it available on the cluster? Exam: 11.x.x.x.if so, I will have to reconfigure the SAN to include the 10.x.x.x and 11.x.x.x. Thank you in advance and any help will be very appreciated.

availability groups and objects outside of database containment

$
0
0
With always on availability groups, system database cannot be replicated. If I want to use availability groups, how do I make objects that exist outside of database containment, agent jobs, linked servers etc highly available ? , or am I barking up the wrong tree in that this cannot be done, which I suspect might be the case.

Cannot create file share on 2-node 2012 file server cluster

$
0
0

I'm basically following this guide: http://derek858.blogspot.com.au/2012/10/windows-server-2012-smb-transparent.html but I am not using RDM disks, rather just standard vmdks running on vmfs. I can successfully create the cluster and everything looks like its running.

When I go to add a file share using the fail-ocer cluster manager, I only get the option to specify a custom path, I cannot select 'by volume' and no volume is listed. When I enter in a custom path I get an error - "the entered path is not valid for the selected server. Please enter a new path or select a different server".

Any ideas?

Live Migration failed - failed to delete configuration: The request is not supported. (0x80070032). Event ID 21502

$
0
0

We have a 3 node cluster attached to a SAN running.  All nodes are running Server 2012. We have 2 virtual machines that will no longer live or quick migrate.  When we try, we get the following error message.

Event ID: 21502

Live migration of 'Virtual Machine Library' failed.

Virtual machine migration operation for 'SRV-XXX' failed at migration source 'NODE1'. (Virtual machine ID 8CC600A0-5491-45B1-896E-E99BB85AA856)

'SRV-XXX failed to delete configuration: The request is not supported. (0x80070032). (Virtual machine ID 8CC600A0-5491-45B1-896E-E99BB85AA856)

We are not having this issue with any of our other 15 virtual machines.  I have searched the forums and have not found any articles with the same situation.


Unable to create 2-Node Cluster - Timeout Server 2012

$
0
0

Hello Everyone,

I have 2-2012 servers that I'm trying to setup clustering with. Individually they can create their own 1 node cluster no problem, adding to an existing cluster doesn't seem to work either. If I destroy one cluster and try to add it to the other existing cluster it fails with timeout. I use the Validate a Confirguration Wizard it says everything checks out successfully (warning on Storage and Network for minor stuff), then fails once it tries to create the cluster with the following:

Beginning to configure the cluster SERVICES.
Initializing Cluster SERVICES.
Validating cluster state on node SERVER1.
Searching the domain for computer object 'SERVICES'.
Creating a new computer account (object) for 'SERVICES' in the domain.
Configuring computer object 'SERVICES in organizational unit OU=Servers,DC=xxxxxx,DC=xxxxx' as cluster name object.
Validating installation of the Network FT Driver on node SERVER1.
Validating installation of the Cluster Disk Driver on node SERVER1.
Configuring Cluster Service on node SERVER1.
Validating installation of the Network FT Driver on node SERVER2.
Validating installation of the Cluster Disk Driver on node SERVER2.
Configuring Cluster Service on node SERVER2.
Waiting for notification that Cluster service on node SERVER2 has started.
Forming cluster 'SERVICES'.
Unable to successfully cleanup.
An error occurred while creating the cluster and the nodes will be cleaned up. Please wait...
An error occurred while creating the cluster and the nodes will be cleaned up. Please wait...
There was an error cleaning up the cluster nodes. Use Clear-ClusterNode to manually clean up the nodes.
There was an error cleaning up the cluster nodes. Use Clear-ClusterNode to manually clean up the nodes.
An error occurred while creating the cluster.
An error occurred creating cluster 'SERVICES'.

This operation returned because the timeout period expired
To troubleshoot cluster creation problems, run the Validate a Configuration wizard on the servers you want to cluster.

Tried going into ACU, then giving full access under Security to the computer accounts by prestaging a cluster name as well. That didn't seem to help even after it enabled the disabled computer account when forming the cluster. I can see if I try to join one to an existing node that it joins, never comes "UP" then gets evicted after a timeout.

Suggestions on where to go next would be much appriciated. It's intersting to me since I've setup multiple clusters on 2008/R2 and am currently running clustered Hyper V 2012 servers as the hosts for these without any issues, so right now 2 clusters are in the environment without any problems. Started a debug log during the creating of the Failover Cluster, didn't see anything that caught my attention as useful information. Not quite sure where to go next.


Can we install PXE services on a Windows Server 2008 R2 failover cluster and make the service available as high end service?

$
0
0

Hi,

We want to install PXE services on a Windows Server 2008 R2 failover cluster, and make the PXE service available as a high end service.

I had an idea of installing PXE Service on both nodes and configure the service for high availablity, but I don't know if this is possible.

Anyone had some experience already with this? Thx.


Wkr, Steve


Wkr, Steve

Creating new cluster with SAS storage

$
0
0

I have new equipment, 2 Dell rack servers running Windows Server 2012 Datacenter and a Dell storage array. The array has 14 disks. I've setup a RAID 10 volume using 12 of the disks with 2 for hotspares. The 2 servers are connected to the array by fibre and they each see the volume as a drive letter.

I've read through some very good articles at MSDN but I'm still left with some questions.

1. Can I create the cluster using the array as storage with it in its current configuration or will I have to delete the volume and create a storage pool using 12 of the 14 disks?

Creating the cluster itself seems straightforward but it's the storage part that I'm uncertain about.

2. I'm also wondering if the two servers need to have once NIC in each connected directly together or if their connection to the LAN is fine. I know that Windows 2003 clusters I've managed (but didn't set up) had one NIC in each connected directly together in a private network and I don't know if this should be done here or not.


Jonathan

Design Question About Witness File Server

$
0
0

If I have 2 servers that will be utilizing hyper-v and storage spaces for clustering.

Instead of having a 3rd dedicated server to be a witness server for my DAG and have a single point of failure. I think I would be better clustering my file server 2012 utilizing hyper v between these two servers. Since if one server failed it would automatically move to the other and if both failed my exchange deployment would fail regardless.

Am I thinking correctly and would this provide better HA for my network? I would like to wipe the 3rd older and slower box and utilize it for some other applications.

Windows 2008 cluster disks show "Reserved"under Disk management

$
0
0

I have 2 nodes Windows 2008 cluster servers.  All disks show online under Failover cluster manager -Storage.  The SAN disks show as "Reserved" under Storage -Disk Management.

I have tried to clear persist reservations and run cluster validation.  In the validation report, there are no errors.  I test the server failover and it works properly too.

However, there are some warnings in Windows Logs -Storage Manager for SANS

Lun - 'MSCS-DTC'
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''


Lun - 'MSCS-Quorum'
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''


Lun - 'MSCS-Logs'
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''


Lun - 'MSCS-Data'
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''
Service.GetObject - 'System.InvalidOperationException - 'Unknown VDS object type specified.''

Does anyone have an idea if it is a problem?  Can I ignore the warnings?

Thanks in advance.

Melissa


Required High Performance Cluster in windows

$
0
0

Hi,

How can i configure an High performance cluster.what operating system should i use to achieve that. 

Clustered Storage Spaces - issue with SAS interposers and all SATA drives (HDD and SSD)

$
0
0

We have an OEM storage vendor telling us that it is NOT recommended at this time to use SAS interposers with any SATA drives (regardless of brand or type (HDD or SSD)) and use it with Windows Storage Spaces/Clustered Storage Spaces. This would be for a shared-SAS backend using a JBOD/HBA and Windows Storage Spaces as a scale-out file server solution. This is the setup you see referenced from Microsoft all over the place.

Is anyone seeing any issues? They claim it's an issue with the LSI SAS interposers (LSISS9252) that all storage vendors rebrand or sell. And, apparently LSI is not seeing this as a high priority fix.

They're alleging that Microsoft product development has told them it's an issue with the firmware on the SAS interposers (which is only upgradable by OEMs BTW) and that using these will result in drives dropping off/not showing up.

BTW, this comes from a Windows Storage Spaces certified product...thoughts? I'm just looking for any information so we can make an informed decision.





Windows 2008r2 clustering multiple network question

$
0
0

I want to build and active/passive 2008r2 cluster that will run netbackup 7.5.  I have mutilple networks that must connect to the cluster so that each subnet will be able to backup.  My question is for each subnet, do i need an ip for each cluster node plus a VIP?

I would imagine that each vip could be listed in dns as vip1.domain.com vip2.domain.com vip3.domain.com and so??

But, that doesnt make sense to me because for netbackup all networks need to take to the same vip name from all subnets. 

Hopefully someone can answer this for me...Thanks!!

2008 Failover cluster disk signature changed and cluster failed during the failover with Event ID 1034

$
0
0

Hi,

We did a firmware upgrade on an HP Blade server that is a part of failover exchange 2007 cluster. After this cluster failover failed with error 1034

Cluster physical disk resource 'Cluster Disk 1' cannot be brought online because the associated disk could not be found. The expected signature of the disk was '29EDDDA1'. If the disk was replaced or restored, in the Failover Cluster Management snap-in, you can use the Repair function (in the properties sheet for the disk) to repair the new or restored disk. If the disk will not be replaced, delete the associated disk resource.

Don't know whether it is due to firmware upgrade.

We have Blade server and EMC as storage and all the cluster disks are in EMC.

Thanks in advance.

Sony


SI

Windows 2008 R2 Clustering - MSDTC Client access point Limitations (IP)

$
0
0

I'm building a Windows Cluster and am adding MSDTC.  The client I work for wants the IP address to be an internal address, not external.  I noticed when attempting to install the MSDTC, under NETWORKS on the Client Access Point screen, the only network available is the external one.  Is there a limitation or can I force it to use another IP from another network?  There are 2 nics/adapters per node, one external and one internal for the heartbeat. 

I'm not sure my client's request is feasible, and I cannot find any explicit information about this. 


m d savilla

Clustered Hyper-V VM cannot be moved to another cluster node

$
0
0
Hello everybody,

I just encountered the following problem:

One of our customers is running two physical Win 2008 R2 SP1 Enterprise Servers which Hosts to a couple of clustered Hyper-V VMs.

Let's say one of the servers is called SrvVirt2, the other one SrvVirt3.

We have to shutdown SrvVirt2 due to some hardware servicing which has to take place, so we began to move the virtual machines hosted on this machine to SrvVirt3. All of them could be moved without any problem, but there's one single machine which we cannot move to the other cluster node.

A couple of seconds after we start the move in Cluster Manager, the system issues the following message:

Error Code: 0x80071398 The operation failed because either the specified cluster node is not the owner of the group, or the node is not a possible owner of the group

We have checked everything over and over. SrvVirt2 and SrvVirt3 are both possible owners of this specific VM.

BTW, it doesn't matter which way we try to get the system to the other node. Move, Fast Move, Live Migration: It all ends up with this message.

Is there any idea or suggestion what we could do so we can move the VM to the other Hyper-V Cluster node?

Kind regards

Alex

Constant Log error id 2050 and 2051 - after added FC Storage and made it HA with CVS

$
0
0
Hi,

yesterday I added a LUN to our hyper-v cluster from an fc storage (eva 4100), and added it to our cluster storage.
This needed the newest MPIO Driver and Qlogic Driver from HP, aswell as adding MPIO Role. 

Everything works fine, Failover of Storage, Failover of VM's, rebooting, no troubles, tried it several times. 

The Cluster check went fine with the storage which I tried before moving over the VM's.

Just our logs turn crazy with those 5 messages reappearing on each cluster node constantly. 
I tried to search up some information about this error, but the results are usually cases where failover is not working. 

But besides the logs that I don't want to ignore, all seems to work fine. 

HYPERV2 2050
Warning Microsoft-Windows-FailoverClustering
Microsoft-Windows-FailoverClustering/Diagnostic
19.02.2013 00:04:58
[ClRtl] SsCoreShareAdd(): status = 2118 share = 5a226444-0bb5-45ef-9579-3594ee536745-135266304$ server = (null)


HYPERV2 2050  Warning Microsoft-Windows-FailoverClustering
Microsoft-Windows-FailoverClustering/Diagnostic
19.02.2013 00:04:58
[ClRtl] SsCoreShareAdd(): status = 2118 share = CSV$ server = (null)


HYPERV2 2051  Error Microsoft-Windows-FailoverClustering
Microsoft-Windows-FailoverClustering/Diagnostic
19.02.2013 00:04:57
[RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQ returned 21.'


HYPERV2 2050  Warning Microsoft-Windows-FailoverClustering
Microsoft-Windows-FailoverClustering/Diagnostic
19.02.2013 00:04:57
[RCM] Failed to load restype 'MSMQ': error 21.


HYPERV2 2050  Warnung Microsoft-Windows-FailoverClustering
Microsoft-Windows-FailoverClustering/Diagnostic
19.02.2013 00:04:57
[RCM] Failed to load restype 'MSMQTriggers': error 21.


HYPERV2 2051  Fehler Microsoft-Windows-FailoverClustering
Microsoft-Windows-FailoverClustering/Diagnostic
19.02.2013 00:04:57
[RHS] s_RhsRpcCreateResType: ERROR_NOT_READY(21)' because of 'Startup routine for ResType MSMQTriggers returned 21.'

We also have this hotfix installed (as first we used iscsi and had troubles with backups, that where gone). We had version 1 installed, today I tried version 2, just to be sure it's not related. I doubt it is, but worth to mention. Hotfix: KB2813630 - v2

Thanks

Patrick
Viewing all 6672 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>