Quantcast
Channel: High Availability (Clustering) forum
Viewing all 6672 articles
Browse latest View live

Strange failure of cluster nodes

$
0
0

I have a 4 node failover cluster on server 2008 R2 which has been running perfectly for many months.

Just tonight 2 of the 4 nodes failed totally with the same symptons.

The only real error event that is being logged is 1282 'Security Handshake between Joiner and Sponsor did not complete in '30' Seconds, node terminating the connection'

Nothing I do will make the nodes restart. Under the 'Network' node of the Cluster all the network connections of the 2 failed nodes are showing as unavailable. However there is no problem pinging either from or to the failed machines using either the LAN, Heartbeat, or 2 iSCSI NICs.

Google returns almost nothing about error event 1282 which is a bit of a worry.

Any help gratefully appreciated as I can't fit all my VMs on to the remaining 2 nodes and have had to shut down non critical VMs.


Clustered File share Permissions won't set (with error)

$
0
0

I have a s2012r2 cluster with a file server. When i go to create a new share it creates with read only permissions and i cannot change it to full or read/write. it is just the share permissions that won't set. when i create a share the error i get is "The Cluster Resource could not be found." if i try to edit the share permissions the error is "Error Occurred while update an SMB share: The cluster resource could not be found" the admin share is working perfectly and the folder permissions are accurate. 

The only solution to this problem i have found online that worked for someone is to destroy the cluster and rebuild it. I really don't want to do that.

Backup File Server Cluster

$
0
0
I have a windows 2008 file server cluster.  We have an issue with our backups in backupexec where the jobs were decreasing performance on our network.  We have our backup server configured for a new separate network.  The file server cluster is configured for our data network.  Is there a way to make the file server cluster configured for both data network and backup network?  I am trying to get out backup job of the file cluster to not degrade performance of our data network.

Cluster disk Reservation

$
0
0

I have a Windows 2008 cluster 2nodes.

How do I can know if a node has a reserved disk and if I have to run: Clear-ClusterDiskReservation ??


Server 2008 SP2 failover SQL cluster file share witness errors every 15 minutes - 1069 and 1558 codes

$
0
0
I have a four node sql cluster (Server 2008 SP2) that gives 1069 and 1558 errors on the file share witness every fifteen minutes.  I have been unable to resolve this issue.  I tried changing from node majority/fsw to node majority/disk. However, I got errors that the disk was not suitable for clustering even though there are 10 other hp eva luns presented without issue.  I tried creating an empty resource with physical disk type and mapping it to no avail.  I have read where a reform can be done.  However, this is a production sql cluster.  There's no way that I can run all resources on one node or take an outage for that matter.  I moved the file share witness resource from one server to another as well which didn't help.  I see there is a hotfix that speaks to this issue but is only for 2008 R2.  http://support.microsoft.com/kb/2750820 Are there any cluster experts out there have a way to resolve this? 

Correct dependency configuration

$
0
0

Hi all,

We recently deployed a new 2 node Server 2012 R2 cluster. This is running SQL Server 2014 in AlwaysOn configuration.

The nodes are located in two different subnets (10.70.1.0/24 and 10.70.3.0/24). There are no shared storage in use. We make use of a file share witness. The node configuration as follows:

Node 1:
Server IP: 10.70.1.30
Cluster IP: 10.70.1.31
SQL Listener IP: 10.70.1.32

Node 2:
Server IP: 10.70.3.30
Cluster IP: 10.70.3.31
SQL Listener IP: 10.70.3.32

I noticed that when looking at the Windows cluster instance dependencies, the below is configured. This looks quite strange and I'm not sure that its correct. My question is - what should the dependencies be set as or is it safe to remove it altogether? 

Currently the cluster IP of node 2 shows as offline (10.70.3.31) and I'm not able to bring it online and I'm wondering if it has something to do with the dependency configuration? 

Your valuable input will be appreciated!


windows 2012 clustering in dmz zone

$
0
0

Hello All,

I need help to configuring windows clustering

1) I have windows 2008 Domain controllers

2) I have installed windows 2012 o/s which i want to configure as fail over cluster

3) now i want to deploy windows 2008 rodc in dmz zone to configure clustering of server's

Please let me is it possible to configure windows 2012 clustering in DMZ zone with windows 2008 RODC environment ???

Thanx in advance

Folder Redirection Error: The requested resource is in use.

$
0
0

I am planning on migrating all of my users' redirected folders off of a single 2008 R2 file server onto a 2012 R2 Failover Cluster. I have created several Continuously Available File Shares to house the new redirected folder locations. I modified Group Policy on a test OU and specified that Windows should move the contents of these folders to the new location. However, it seems to be a bit of a crap-shoot as to whether the files actually get moved or not. Half of them seem to move over just fine, the other half seem to get Event ID 502. For example:

 The following error occurred: "Can not create folder "\\<Path>\Documents"".
 Error details: "The requested resource is in use.

There doesn't seem to be any rhyme or reason as to what folders move, and which ones don't, and it seems to be different for every test account I move. Files are neveractually in use until the file move action itself is initiated. It almost seems that Windows is stepping on its own toes. How can I prevent this from happening?


cluster fails to reset CNO password in AD

$
0
0

We have a WS2012 Hyper-V cluster. The cluster has DNS name of hvcluster.domain.local, cluster CNO object in AD called hvcluster$, 2 nodes called node1.domain.local (computer account node1$) and node2.domain.local (computer account node2$)

The cluster CNO is in a failed state. As a consequence, its dynamic DNS record is missing and Live Migration doesn't work. The primary problem is that when I use the Repair option on the CNO, the repair will fail with the following error:

"There was an error repairing the active directory object for "Cluster Name'. Details: There was an error resetting the active directory password for 'Cluster name'. Error code: 0x80005000'

This isn't a new cluster, it's been running for about 2 years now, but this problem manifested recently. I'm aware of the AD requirements for the cluster and for testing purposes I've additionally granted Full Access on the hvcluster computer account to the cluster computer account itself and to both cluster nodes' computer objects (through a group that both nodes are members of).

The account I used for the Repair action (and all other actions) is a member of the Domain Admins group.

Since that didn't help, I've checked that Authenticated Users group is member of the local "Users" group on the cluster nodes. Additionally I've tried modifying local group policy per http://blogs.technet.com/b/askcore/archive/2013/04/04/new-network-name-resource-fails-to-come-online.aspx. That didn't help either.

I've also checked that http://support.microsoft.com/kb/2838043 is installed on both cluster nodes.

From the cluster log (excerpt):

000014a8.00001014::2015/03/03-12:52:32.368 INFO  [RES] Network Name <Cluster Name>: AccountAD: OU name for VCO is OU=Hyper-V,DC=domain,DC=local
000014a8.00001014::2015/03/03-12:52:32.383 INFO  [RES] Network Name:  [NN] Setting crypto access members for decrypt. New container = false.
000014a8.00001014::2015/03/03-12:52:32.383 INFO  [RES] Network Name: [NNLIB] Priming local KDC cache to \\DC01.domain.local for domain domain.local
000014a8.00001014::2015/03/03-12:52:32.383 INFO  [RES] Network Name: [NNLIB] PopulateKerbKDCLookupCache - DC flags 0
000014a8.00001014::2015/03/03-12:52:32.383 INFO  [RES] Network Name: [NNLIB] LsaCallAuthenticationPackage success with a request of size 100, result size 0 (status: 0, subStatus: 0)
000014a8.00001014::2015/03/03-12:52:32.383 INFO  [RES] Network Name: [NNLIB] Priming local KDC cache to \\DC01.domain.local for domain label domain
000014a8.00001014::2015/03/03-12:52:32.383 INFO  [RES] Network Name: [NNLIB] LsaCallAuthenticationPackage success with a request of size 78, result size 0 (status: 0, subStatus: 0)
000014a8.0000227c::2015/03/03-12:52:32.399 INFO  [RES] Network Name <Cluster Name>: Getting Read/Write private properties
000014a8.00001014::2015/03/03-12:52:32.414 WARN  [RES] Network Name: [NNLIB] LogonUserEx fails for user HVCLUSTER$: 1326 (useSecondaryPassword: 0)
000014a8.0000227c::2015/03/03-12:52:32.430 INFO  [RES] Network Name <Cluster Name>: Getting Read only private properties
000014a8.00001014::2015/03/03-12:52:32.446 WARN  [RES] Network Name: [NNLIB] LogonUserEx fails for user HVCLUSTER$: 1326 (useSecondaryPassword: 1)
000014a8.00001014::2015/03/03-12:52:32.446 INFO  [RES] Network Name: [NNLIB] Logon failed for user HVCLUSTER$ (Error 1326), DC \\DC01.domain.local, domain domain.local
000014a8.00001014::2015/03/03-12:52:32.446 ERR   [RES] Network Name:  [NN] GetToken - Logging on as the CNO failed with error 1326
000014a8.00001014::2015/03/03-12:52:32.446 INFO  [RES] Network Name <Cluster Name>: AccountAD: End of Slow Operation, state: Initializing/Writing, prevWorkState: Writing
000014a8.00001014::2015/03/03-12:52:32.446 WARN  [RES] Network Name <Cluster Name>: AccountAD: Slow operation has exception ERROR_INVALID_HANDLE(6)' because of '::ImpersonateLoggedOnUser( GetToken() )'
000014a8.0000227c::2015/03/03-12:52:32.446 INFO  [RES] Network Name: Agent: OnInitializeReply, Failure on (6b0ee668-0731-4252-b066-dd657fd23f25,AccountAD): 6
000014a8.0000227c::2015/03/03-12:52:32.446 INFO  [RES] Network Name <Cluster Name>: Configuration: InitializeReplyCreation of NetName (type Singleton), result: 6, IsCanceled: false
00001fdc.000018ac::2015/03/03-12:52:32.446 INFO  [GEM] Sending 1 messages as a batched GEM message
000014a8.0000227c::2015/03/03-12:52:32.446 INFO  [RES] Network Name <Cluster Name>: Configuration: Setting 'StatusKerberos' in clusdb returned status 0
000014a8.0000227c::2015/03/03-12:52:32.446 INFO  [RES] Network Name <Cluster Name>: Configuration: Deleting ResourceData, CreatingDC, ObjectGUID for a newly created netname from cluster database
00001fdc.000018ac::2015/03/03-12:52:32.446 INFO  [GEM] Sending 1 messages as a batched GEM message
000014a8.000021c4::2015/03/03-12:52:32.461 INFO  [RES] Network Name <Cluster Name>: Getting Read/Write private properties
00001fdc.000018ac::2015/03/03-12:52:32.461 INFO  [GEM] Sending 1 messages as a batched GEM message
000014a8.0000227c::2015/03/03-12:52:32.477 INFO  [RES] Network Name: Agent: OnInitializeReply, Failure on (6b0ee668-0731-4252-b066-dd657fd23f25,Configuration): 6
000014a8.0000227c::2015/03/03-12:52:32.477 INFO  [RES] Network Name <Cluster Name>: SyncReplyHandler Configuration, result: 6
000014a8.00001568::2015/03/03-12:52:32.477 INFO  [RES] Network Name <Cluster Name>: PerformOnline - Initialization of Configuration module finished with result: 6
000014a8.00001568::2015/03/03-12:52:32.477 ERR   [RES] Network Name <Cluster Name>: Online thread Failed: ERROR_SUCCESS(0)' because of 'Initializing netname configuration for Cluster Name failed with error 6.'
000014a8.00001568::2015/03/03-12:52:32.477 INFO  [RES] Network Name <Cluster Name>: All resources offline. Cleaning up.
000014a8.00001568::2015/03/03-12:52:32.477 ERR   [RHS] Online for resource Cluster Name failed.

Any ideas? Btw. I've been through many articles like: https://support.microsoft.com/kb/2838043/, https://social.technet.microsoft.com/forums/windowsserver/en-us/2ad0afaf-8d86-4f16-b748-49bf9ac447a3/ws2012-cluster-network-dns-issues, http://blogs.technet.com/b/askcore/archive/2013/04/04/new-network-name-resource-fails-to-come-online.aspx, http://blogs.technet.com/b/askcore/archive/2012/09/25/cno-blog-series-increasing-awareness-around-the-cluster-name-object-cno.aspx etc.

How to configure automatic start option of the Hyper-V role (Windows 2012 R2)

$
0
0

Hi,

If there a way to configure the automatic start up option of each Virtual Machine?

I know that you can't use the VMM settings to setup the Automatic Start action, and the Cluster Hyper-V setting only specify the "shutdown" action.

How can I make sure that after a an event like a power recovery affecting all my node that all my Hyper-V role will be started?Thank

Hyper-V 2012 does not scale and is not stable enough for production use WHO has 200+ VM's with stability? Event ID 1146, 1230, 5120

$
0
0

For years now, we have had event ID 1146 crash nodes in the cluster (RHS process crashes).  We have had several paid microsoft cases open, even one with premier.  In fact we have one open currently with zero progress in 72 hours (115012612321318). 

Is anyone really running 200+ machines out there with Hyper-V with any level of stability in production, or do you have a complete host (event id 1146) or volume (event id 5120) outage every month or so?  

We have applied recommended hotfixes, and gone through the configuration many many times.

My only conclusion is that Hyper-V does not scale.  Once we started adding a lot of machines and hosts, we started getting event 5120 (with STATUS_IO_TIMEOUT) which is unacceptable.   Causes a huge slowdown or makes an entire volume inaccessible and impacts EVERY machine in the volume.  The other volumes work when this happens.  In fact, we have a VMware cluster attached to the same san with the same host hardware, and it works flawlessly.  Both use MPIO, so the timeout is caused by Hyper-V.  The load was nearly identical on Vmware and Hyper-V at one time, we had 100 machines on both and the same amount of hosts.   CPU load is tiny, memory is less than 50%, IO uses 55 disk spindles for normal storage and another 55 for fast storage.

I'm more or less asking the community how to fix this since the support is not working, but I'm guessing there is no fix and this is really not production ready.  I would really like to here from ANYONE (non-sales) that is using 200+ machines without big outages.


Changing share name on a CAFS share

$
0
0

I'm configuring a Continuously Available File Share (CAFS) for General Use. What I would like to do is to have the UNC share name different from the local path name. By default, the New Share Wizard wants them to be the same. Basically what I want is this:

Local path: D:\Shares\Share1
Remote path: \\FileServer\SH1

...but the New Share Wizard in the Failover Cluster Manager wants to use the "Share1" name for both the local and remote paths. This is easy enough to do within Windows if you weren't doing CAFS. So my first thought was simply to go into Windows and change the share name. So I did. It let me add a new share name, and delete the old one perfectly fine. Once I did, I modified the properties of the new share name in the Failover Cluster Manager to "Enable Continuous Availability", and everything seemed happy.

Does anyone know, is this the right way to do this? Or is there a better way?

Winsdows 2008 R2 cluster query

$
0
0

When i am searching a printer from dsa.msc.. its showing from active node and resource. so i am getting two search result and showing two printers.

can someone tell me something wrong with the cluster configuration or it is a default behavior.


AliahMurfy

Domain Controller on Cluster??

$
0
0

Hi Everyone,

I doing Active Directory performance testing running on VMware virtual environment, for that i need to put my DC on Active:Active Cluster configuration, now my question to community is that can we put DC on Cluster because as far as i know you can't add domain Controller on windows  2012 cluster? correct me if i am wrong, my second Question is if "yes" what are the other ways to put DC on Cluster/HA.....

Thanks in Advance..  

Best design for file services high availability

$
0
0

Hi,

I have a question about designing a highly available file server; we are aiming at providing high availability for general documents, design files, spreadsheets and so on. Our environment uses a mix of Windows 7 and 8 and I am aware of the limitations with SMB transparent failover for those clients. Each node in the FS Cluster will be a virtual server (guest clustering). The host and the guests will be 2012 R2. I understand that there is now a feature for using a shared VHDX between two guest cluster nodes, I can see this being an advantage but my main concern is how to achieve high availability if that shared VHDX becomes corrupt.

Our host is configured using a virtual SAN which synchronises synchronously across the two physical nodes, this shared VHDX could exist on that virtual SAN sure, but what if the VHDX itself gets corrupted and can’t be “attached” to the virtual machine or is formatted?

Is there a way I can utilise both DFS and HA for file server availability, struggling to come up with a sensible design which doesn’t go crazy with the number of duplicated VHDX’s for resiliency (I think 2 VHDX is enough in this scenario)

Many thanks for any suggestions

Steve


node removed from the cluster and cannot joined to clustr again

$
0
0

Hi

I have problem with 2 node cluster. we are running windows 2008 R2 enterprise edition. Suddenly one node had some issue with network connnectivity and it was removed from the cluster. on the console its shows node1 offline. connectivity between the server public  and private nic is fine.

Node 'node1' failed to form a cluster. This was because the witness was not accessible. Please ensure that the witness resource is online and available.

Cluster resource 'File Share Witness' in clustered service or application 'Cluster Group' failed.

'File Share Witness is accessible to both the nodes. from the cluster log i can see failed to attempt lock on the FSW.

I have restarted the failure node. still the problem exist.

Thanks in advance to throw some valuable answers to my issue.


RRAS Cluster - HNV Gateway - Failover of Routingdomains error

$
0
0

We are running several RRAS Clusters on 2012 R2 to provice Hyper-v Network Virtualization Gateway service to many vmnetworks. (RRAS is installed in multi-tenancy mode)

Sometimes it happens that the RRAS Cluster is inconsistent and the Failover of the RRAS Routing domains fails.

The event log Microsoft-Windows-RasClusterResource reports the following error every minute:

Importing RemoteAccess service configuration using C:\ClusterStorage\Volume1\XXXXXX_RemoteAccessConfig_2015_02_02_12_13_20_216.config.

Failed to import RemoteAccess service configuration from C:\ClusterStorage\Volume1\XXXXXX_RemoteAccessConfig_2015_02_02_12_13_20_216.config.

The Cluster log reports the following error:

00000bbc.00000e80::2015/02/02-12:31:57.469 INFO  [RES] RAS Cluster Resource <RasResource>: RasClusterImportConfiguration: Importing RAS Configuration 'C:\ClusterStorage\Volume1\XXXXXXX_RemoteAccessConfig_2015_02_02_12_13_20_216.config.
00000bbc.00000e80::2015/02/02-12:31:57.469 INFO  [RES] RAS Cluster Resource <RasResource>: RasClusterWmiUtil::ImportRasConfiguration++
00000bbc.00000e80::2015/02/02-12:31:57.469 INFO  [RES] RAS Cluster Resource <RasResource>: RasClusterWmiUtil::ImportRasConfiguration: Reading the RAS config object from 'C:\ClusterStorage\Volume1\XXXXXXX_RemoteAccessConfig_2015_02_02_12_13_20_216.config'...
00000bbc.00000e80::2015/02/02-12:31:57.469 INFO  [RES] RAS Cluster Resource <RasResource>: RasClusterWmiUtil::ReadWmiObjectFromFile++
00000bbc.00000e80::2015/02/02-12:31:57.477 INFO  [RES] RAS Cluster Resource <RasResource>: RasClusterWmiUtil::ReadWmiObjectFromFile--
00000bbc.00000e80::2015/02/02-12:31:57.505 ERR   [RES] RAS Cluster Resource <RasResource>: RasClusterWmiUtil::ImportRasConfiguration:MI_Operation_GetInstance failed, error = 'MI_RESULT_NOT_FOUND'
00000bbc.00000e80::2015/02/02-12:31:57.505 INFO  [RES] RAS Cluster Resource <RasResource>: RasClusterWmiUtil::ImportRasConfiguration--
00000bbc.00000e80::2015/02/02-12:31:57.505 ERR   [RES] RAS Cluster Resource <RasResource>: RasClusterImportConfiguration: Failed to Import RAS configiration from 'C:\ClusterStorage\Volume1\XXXXXXX_RemoteAccessConfig_2015_02_02_12_13_20_216.config': 31.

There is no way to refresh the policies to the second node and hnv virtualization is also broken for several vmnetworks and management through VMM is not possible anymore.

Is there any way to reset the node so it gets the correct routing information again?


Cluster File Server Migration

$
0
0

Dears,

I have my source cluster based on WIndows 2008 R2 with SP1, and my new target cluster is Windows 2012 R2.

when I run the copy wizard it validate if I can migrate the file server, however, it is giving me those in yellow mark:

  • File Server IP Address is not eligible to be copied.
  • FileServer01 is not eligible to be copied.

    Is this normal?

How to continue migrating file server role?

Set cluster name and IP address ONLINE

$
0
0

Hi

I had to shut down all cluster nodes. After turned on all cluster nodes I saw in Management cluster console:

Basic cluster resourses:

Name: CLuster1  State:Offline

Adress IP: xx.xx.xx.xx. State Offline

How to set these cluster resources (cluster name, ip adress) Online using Powershell?


Kind Regards Tomasz

Migration of VMs from WS2012 Hyper-V Hosts Cluster to WS2012 R2 Hyper-V Hosts Cluster

$
0
0

Hello All,

We’re currently running our production VMs on a Failover Cluster of Windows Server 2012 Hyper-V Hosts. We’re planning to migrate these VMs to the cluster of Windows Server 2012 R2 Hyper-V Hosts.

I have created a failover cluster of Windows Server 2012 R2 Hyper-V Hosts, and successfully tested the HA of my new test VMs on this new cluster.

Anyone please tell me the procedure, steps and best practices to migrate these VMs from Windows Server 2012 Hyper-V Hosts to Windows Server 2012 R2 Hyper-V Hosts.

Thank you.


Regards,

Hasan Bin Hasib


Viewing all 6672 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>