Quantcast
Channel: High Availability (Clustering) forum
Viewing all 6672 articles
Browse latest View live

Failover cluster manager

$
0
0

Hi Guys,

I am getting below error message while validate a configuration of failover cluster manager.


I have checked services and cluster services are started. I have rebooted both the cluster server, but still getting the same error message. Any further help will be appreciated.


Thanks


How to Load Balance Event Collection

$
0
0

Hi, 

I got the Windows Event Forwarding working, but how can I load balance a huge number of souce computers to forwared their events to two or more servers in my domain?

Im Using Windows Server 2012R2 as my collectors and 2008R2 and after as my forwarders.

Thanks for your replies.


Cluster CSV had errors, failed over, I see two volumes appear under CSV one is "unknown"

$
0
0

So, we've been having issues with one of our clusters. Yesterday in the evening when no one was working it seems like a bunch of VMs went down. I found some errors in a couple event logs that show it seems the CSV failed but I can't find any indication as to why. My storage appliance has no record of any problems at that time, and I can't find any other possible reasons apart from a problem within the cluster.

All six nodes are running up to date Server 2012 R2, and are Managed by SCVMM 2012 R2 running off a virtual machine hosted by another cluster. My storage is a Tegile ZEBI unit, and I've thin provisioned 20TB of disk space. Disk is accessed by iSCSI on separate NICs and separate switches from other normal cluster/VM traffic.

Below are the errors, and a screenshot of an "unknown" volume listed under my CSV, seems odd? In cluster Failover manager, under storage\Disks after selecting my CSV, in the bottom pane I see two volumes listed:

In cluster manager, I found this error:

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          2014-12-16 6:41:33 PM
Event ID:      5120
Task Category: Cluster Shared Volume
Level:         Error
Keywords:      
User:          SYSTEM
Computer:      CLUSTERHOST4.DOMAIN.INTERNAL
Description:
Cluster Shared Volume 'Volume 1' ('CSV') has entered a paused state because of '(c000000b5)'. All I/O will temporarily be queued until a path to the volume is reestablished.


I went to the node who owned the CSV, and in the event log I found this error:

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          2014-12-16 6:48:22 PM
Event ID:      1230
Task Category: Resource Control Manager
Level:         Error
Keywords:      
User:          SYSTEM
Computer:      CLUSTERHOST1.DOMAIN.INTERNAL
Description:
A component on the server did not respond in a timely fashion. This caused the cluster resource 'CSV' (resource type 'Physical Disk', DLL 'clusres.dll') to exceed its time-out threshold. As part of cluster health detection, recovery actions will be taken. The cluster will try to automatically recover by terminating and restarting the Resource Hosting Subsystem (RHS) process that is running this resource. Verify that the underlying infrastructure (such as storage, networking, or services) that are associated with the resource are functioning correctly.


Then this error:

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          2014-12-16 6:59:39 PM
Event ID:      1146
Task Category: Resource Control Manager
Level:         Critical
Keywords:      
User:          SYSTEM
Computer:      CLUSTERHOST1.DOMAIN.INTERNAL
Description:
The cluster Resource Hosting Subsystem (RHS) process was terminated and will be restarted. This is typically associated with cluster health detection and recovery of a resource. Refer to the System event log to determine which resource and resource DLL is causing the issue.


Then this error:

Log Name:      System
Source:        Microsoft-Windows-Ntfs
Date:          2014-12-16 7:01:03 PM
Event ID:      140
Task Category: None
Level:         Warning
Keywords:      (8)
User:          SYSTEM
Computer:      CLUSTERHOST1.DOMAIN.INTERNAL
Description:
The system failed to flush data to the transaction log. Corruption may occur in VolumeId: VirtualMachines, DeviceName: \Device\HarddiskVolume7.
(A device which does not exist was specified.)




Cluster Resource offline, Windows 2008R2

$
0
0

Hi All, 

I have cluster nodes.
My cluster was working fine untill last week, after that it failed to bring the resource online.
I see that cluster IP is online, but name not online,.
I have checked from the DNS side the cluster name has host entry, both nodes are communication to DC.
When i try to bring the resource online, it fails.
pls help.

BSODs: DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)

$
0
0

Hi there,

I have a problem when I mount a disk on windows, the BSODs happen when I attempt to manage the disk.

Windbg analyze the mini dump file as follow:

*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck D1, {14, 2, 0, fffff880011c61c2}

Probably caused by : msdsm.sys ( msdsm!DsmpQueryLoadBalancePolicy+232 )

Followup: MachineOwner
---------

24: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000014, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff880011c61c2, address which referenced memory

Debugging Details:
------------------


READ_ADDRESS: GetPointerFromAddress: unable to read from fffff800018c8100
 0000000000000014

CURRENT_IRQL:  2

FAULTING_IP:
msdsm!DsmpQueryLoadBalancePolicy+232
fffff880`011c61c2 8b4814          mov     ecx,dword ptr [rax+14h]

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  DRIVER_FAULT_SERVER_MINIDUMP

BUGCHECK_STR:  0xD1

PROCESS_NAME:  mmc.exe

TRAP_FRAME:  fffff8800d14b0b0 -- (.trap 0xfffff8800d14b0b0)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000001
rdx=fffff880011cb110 rsi=0000000000000000 rdi=0000000000000000
rip=fffff880011c61c2 rsp=fffff8800d14b240 rbp=fffffa803dcbb770
 r8=fffffa808a2f3a78  r9=fffffa803dcbbb40 r10=0000000000000000
r11=fffff8800d14b200 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei pl nz na pe nc
msdsm!DsmpQueryLoadBalancePolicy+0x232:
fffff880`011c61c2 8b4814          mov     ecx,dword ptr [rax+14h] ds:00000000`00000014=????????
Resetting default scope

LAST_CONTROL_TRANSFER:  from fffff80001690169 to fffff80001690bc0

STACK_TEXT:
fffff880`0d14af68 fffff800`01690169 : 00000000`0000000a 00000000`00000014 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
fffff880`0d14af70 fffff800`0168ede0 : 00000000`00000001 fffff880`0d14b120 00000000`00000008 00000000`00000000 : nt!KiBugCheckDispatch+0x69
fffff880`0d14b0b0 fffff880`011c61c2 : fffffa80`8a2f3a68 fffffa80`3dcbb770 fffffa80`3dcbbd60 fffffa80`8a2f3a68 : nt!KiPageFault+0x260
fffff880`0d14b240 fffff880`011c5ed8 : 00000000`c0000295 00000000`00000000 00000000`00000004 fffffa80`3dcbb770 : msdsm!DsmpQueryLoadBalancePolicy+0x232
fffff880`0d14b2a0 fffff880`01216001 : 00000000`00000000 fffffa80`3dcbb1b0 fffffa80`3dcbb1b0 ffffee4c`f9650188 : msdsm!DsmQueryData+0x23c
fffff880`0d14b310 fffff880`0138b28e : fffffa80`3dcbb060 00000000`00000000 fffffa80`00000004 fffff8a0`00000000 : mpio!MPIODsmQueryDataBlock+0x1b1
fffff880`0d14b4b0 fffff880`01217935 : fffffa80`8a3e9450 fffff800`016764c3 fffffa80`8a3e9450 fffff880`0d14b610 : WMILIB!WmiSystemControl+0x286
fffff880`0d14b5a0 fffff880`01201766 : fffff8a0`00000000 fffffa80`8a3e9450 fffffa80`8a3e9450 fffff880`0d14b700 : mpio!MPIOPdoWmi+0x79
fffff880`0d14b610 fffff880`017d4e0b : fffffa80`8a3e9450 fffffa80`8a3e9450 fffffa80`3dcc6910 00000000`c00000bb : mpio!MPIOWmiDispatch+0x12
fffff880`0d14b640 fffff880`012365db : fffffa80`3dccf060 fffffa80`8a3e9450 fffffa80`8a3e9450 fffff880`0d14b730 : CLASSPNP! ?? ::NNGAKEGL::`string'+0xab7
fffff880`0d14b700 fffff800`0192912c : fffffa80`00000003 fffffa80`3dcbb060 fffffa80`8a2f3901 fffffa80`8a2f3901 : partmgr!PmSystemControl+0xab
fffff880`0d14b730 fffff800`01a69943 : fffffa80`8a35fc10 00000000`00000000 fffffa80`8a3e9480 fffffa80`8a2f3901 : nt!WmipForwardWmiIrp+0x16c
fffff880`0d14b7b0 fffff800`019d6b9b : fffffa80`8a1ac000 fffffa80`423ec290 00000000`00000238 00000000`00000000 : nt!WmipQuerySetExecuteSI+0x293
fffff880`0d14b8c0 fffff800`019ade67 : fffffa80`423ec290 fffff880`0d14bca0 fffff880`0d14bca0 fffffa80`8a3e9450 : nt! ?? ::NNGAKEGL::`string'+0x2def9
fffff880`0d14ba10 fffff800`019ae6c6 : 00000000`00000000 00000000`00000fc4 00000000`00000000 00000000`00000000 : nt!IopXxxControlFile+0x607
fffff880`0d14bb40 fffff800`0168fe53 : 00000000`001ad90a fffff880`0d14bad0 00000000`00000000 00001bb4`00000000 : nt!NtDeviceIoControlFile+0x56
fffff880`0d14bbb0 00000000`7733132a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13
00000000`001ad768 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x7733132a

24: kd> lmvm mpio
start             end                 module name
fffff880`010aa000 fffff880`010d4000   mpio       (pdb symbols)          c:\symbolfilepath\mpio.pdb\4C1A5F67D37E4544A84B41A8BB4B77121\mpio.pdb
    Loaded symbol image file: mpio.sys
    Mapped memory image file: C:\symbolFilePath\mpio.sys\4CE7A47A2a000\mpio.sys
    Image path: \SystemRoot\system32\DRIVERS\mpio.sys
    Image name: mpio.sys
    Timestamp:        Sat Nov 20 18:35:38 2010 (4CE7A47A)
    CheckSum:         00034DF8
    ImageSize:        0002A000
    File version:     6.1.7601.17514
    Product version:  6.1.7601.17514
    File flags:       0 (Mask 3F)
    File OS:          40004 NT Win32
    File type:        3.7 Driver
    File date:        00000000.00000000
    Translations:     0409.04b0
    CompanyName:      Microsoft Corporation
    ProductName:      Microsoft® Windows® Operating System
    InternalName:     mpio.sys
    OriginalFilename: mpio.sys
    ProductVersion:   6.1.7601.17514
    FileVersion:      6.1.7601.17514 (win7sp1_rtm.101119-1850)
    FileDescription:  MultiPath Support Bus-Driver
    LegalCopyright:   © Microsoft Corporation. All rights reserved
24: kd> .frame /r 3
03 fffff880`0ded5240 fffff880`01346ed8 msdsm!DsmpQueryLoadBalancePolicy+0x232
rax=fffff8800ded5070 rbx=0000000000000000 rcx=000000000000000a
rdx=0000000000000014 rsi=fffffa803def8a00 rdi=0000000000000001
rip=fffff880013471c2 rsp=fffff8800ded5240 rbp=fffffa803def9dd0
 r8=0000000000000002  r9=0000000000000000 r10=fffff880013471c2
r11=0000000000000000 r12=0000000000000000 r13=0000000000000002
r14=fffffa803d52dc40 r15=fffffa803def9dd8
iopl=0         nv up ei ng nz na pe nc
cs=0010  ss=0000  ds=002b  es=002b  fs=0053  gs=002b             efl=00000282
msdsm!DsmpQueryLoadBalancePolicy+0x232:
fffff880`013471c2 8b4814          mov     ecx,dword ptr [rax+14h] ds:002b:fffff880`0ded5084=fffffa80

I do not kown why, can anybody give me some suggests?

notes: you can get mini dump file form http://pan.baidu.com/s/1i3vdRqX

thanks.

DSM QFE Number mismatch on windows 2012

$
0
0

Hi, my validation tests are failing with below error on MPIO disks on windows 2012 standard clustered server

For the device-specific module (DSM) named Microsoft DSM,
versions do not match between node abc.xyz.com and node abc1.xyz-04.xyz.com.
For the device-specific module (DSM) named Microsoft DSM,
versions do not match between node abc.xyz.com and node
abc1.xyz-04.xyz.com
Stop: 12/9/2014 11:44:10 PM.

Indicates two revision levels are incompatible

The patch levels on both the nodes are same but there is a QFE number mismatch. Any ideas how to bring it to the same version?


Faizan

Multi-Site High Available File Server

$
0
0

Hi All,

I am planning to deploy a high available file server spanning 3 sites using Windows Server 2012 R2. The 3 Sites are located quite close to each other.

Any suggestion or recommendations?

Thank You In Advance

Failover Cluster in a box

$
0
0

Hi,

I'm in the process of building a test environment which requires a failover cluster for a SQL Server Always on configuration of 2 DB nodes. I have a single Windows 2012 R2 host server with Hyper-V installed and my guest OS's configured using local storage on this server. 

In order to create the guest failover cluster, I need to create a Quorum drive - which needs to be shared between the two DB nodes. Using only this physical host server and it's local storage, is this possible to achieve?

I believe that options for creating a shared VHDX disc are to host it on a CSV or a Scale out file server, but both of these seem to require the creation of a cluster themselves, which will require a Quorum disc on shared storage itself!

Thanks, 

David



Cluster Node Unable to Maintain Cluster Membership

$
0
0

My cluster logs are very similar to the above thread... was it ever addressed?

[SV] Already protecting connection with message security level 'sign'

[FTI] Stream already exists to node: false

[Channel IP to another cluster node member] Close()

GracefuleClose(1226) because of channel to remote endpoint another cluster node~ is closed

Cluster services stops and generates:

The Kerberos client received a KRB_AP_ERR_MODIFIED error from the server serverName$. The target name used wasserverName. This indicates that the target server failed to decrypt the ticket provided by the client. This can occur when the target server principal name (SPN) is registered on an account other than the account the target service is using. Ensure that the target SPN is only registered on the account used by the server.


Roderick Lyons

Cannot browse Internet for Microsoft NLB in VMWare

$
0
0

Hi,

I have setup two Windows 2012 R2 virtual web servers that sit in Microsoft Network Load Balancing cluster. This cluster is configured using Multicast Mode.

On firewall level, this cluster IP is NAT to a public IP which allows Internet access. I can ping to either server or any other server that sits on the same subnet. However, I can't ping to Google DNS 8.8.8.8 from either of the web server.

Server 1: 192.168.5.2

Server 2: 192.168.5.3

Cluster IP: 192.168.5.4 NAT to a public IP

Has anyone encountered this before?

Kindly advise.

Thanks,

Shawn

Clussvc.exe CPU spike windows 2012 r2

$
0
0

Hello,

Does anyone experience CPU spike on windows cluster.exe (windows 2012 R2, SQL Server 2014 all updates installed).

The sympton is every a couple of minutes, cpu spikes and it is driven by cluster.exe. The environment only have SQL Server running, no SCOM etc...

Microsoft is following this case with me now, but so far no progress yet, just wondering if you have the same issue?

Thanks,

Albert

JBOD resliency

$
0
0

I am implementing scale-out file server cluster. All of my hardware is certified on server catalog

I have 3 physical JBODs each with 24 bays (3.5in) drives. Running the get-storageenclosure shows 6 enclosures.

I am told from JBOD vendor that I actually  have  6 enclosures,due to the fact that I have redundant I/O modules.

Does mean I can only lose 1 JBOD/Controller on the enclosure aware Vdisks? Should I set this up another way to get more resiliency?

Thank you.

Windows 2012 R2 Cluster

$
0
0

I have a 5 node cluster with a shared clustered disk connect to a dell equalogic ps4000i.  I was doing maintenance on one of the nodes (node 3), which happened to be the node owner of the clustered shared volume.  I manually did a live migration of all the virtual machines one at a time.  Once completed, I selected Pause and then the option to drain the roles to move the clustered shared volume to another server.

While this was in progress, I noticed that all of the virtual machines started to go offline.  the failover cluster manager showed that the ownership has moved to node 1.  when I checked the status of node 1, the clustered disk was showing online (no access).any thoughts?  It took me about 15 minutes to get things back up but I had to manually move it to another node before it connected in successfully

about MsLbfoProvider,ID=16945 Causes,effects,countermeasures from Windows Server 2012 R2

$
0
0

hi,Expert:

            I want to consult about MsLbfoProvider,ID=16945 Causes,effects,Countermeasures from Windows Server 2012 R2?

Thanks in advance

about MsLbfoProvider,ID=16945 Causes,effects,Countermeasures from Windows Server 2012 R2?

$
0
0
hi,Expert:

            I want to consult about MsLbfoProvider,ID=16945 Causes,effects,Countermeasures from Windows Server 2012 R2?

Thanks in advance

Node and File Share Majority : The user name or password is incorrect

$
0
0

Hi all,

I have a file share which I want to use as the file share witness for my Microsoft Cluster (Node majority now)

Error is :

An error was encountered while modifying the quorum settings.
Your cluster quorum settings have not been changed.
There was an error configuring the file share witness '\\domainame\QW'.
Unable to save property changes for 'File Share Witness'.
The user name or password is incorrect

The share has Modify permission on the Share and NTFS for the computer accounts of the nodes. 

The only catch is that the Nodes and the server hosting the File share are on different networks (Across ACLs) - though SBM (TCP & UDP 445) is open between the networks to the file share server and vice verse.

Any suggestions is much appreciated ?

Regards,

Ramu


Ramu V Ramanan

Remove a volume

$
0
0

Hi,

I've made a mistake during an operation (using Veritas Storage Foundation) and would like to remove a volume from a cluster.

Veritas actually doesn't allow me to remove the volume into its interface until i remove it from the cluster.


Do you know if i can do it (without bringing any ressource offline)?

Thanks

SQL Clustered Servers

$
0
0

I’ve installed server 2012 R2 with the Failover Cluster Role and have configured two network interfaces.  On the heartbeat NICs for each server I see a “Microsoft Failover Cluster Virtual Adapter Performance Filter” which is not enabled for the NIC.  Should I enable this for the heartbeat NICs for the servers in my cluster?

any and all thoughts would be greatly appreciated.


Leonard Hoffman

The lease timeout between avaiability group and the Windows Server Failover Cluster has expired

$
0
0

Hi,

I am having some issues where I get a lease timeout from time to time.  I have a Windows 2012 Failover Cluster with 2 nodes and 2 SQL 2012 Always-on Availability Groups.  Both nodes are a physical machines and each node is the primary for an AG. 

From what I understand ifthe HealhCheckTimeoutis exceeded without the signal exchange the lease is declared 'expired' and the SQL Server resource dll reports that the SQL Server availability group no longer 'looks alive' to the Windows cluster manager.  Here are the properties I have setup which are the default settings:

LeaseTimeout - 20000

HealthCheckTimeout - 30000

VerboseLoging - 0>

FailureConditionLevel – 3

Here are the events that occur in the Application Event Viewer:

Event ID 19407:

The lease between availability group 'AG_NAME' and the Windows Server Failover Cluster has expired. A connectivity issue occurred between the instance of SQL Server and the Windows Server Failover Cluster. To determine whether the availability group is failing over correctly, check the corresponding availability group resource in the Windows Server Failover Cluster.

Event ID 35285:

The recovery LSN (120881:37533:1) was identified for the database with ID 32. This is an informational message only. No user action is required.

SQl server logs are too long to post in this box but I can send them if you request.

The AG is setup to failover automatically but it did not failover.  I am trying to figure out why the lease timed out.  Thanks.

MSCS2012r2 disk issue FC

$
0
0

Hi Folks!

 

I have to troubleshoot a new installed MSCS2012r2 on ESXi5.5 (MSCS on different boxes)

We use as storage vendor netapp in a FC environment.

The system disk of each Windows2012r2 is based on a vmdk file. The other partitions (quorum, MSDTC, data) are build up on raw device mappings.

The LUN type for the raw device mappings on the netapp is Windows2008 or later.

On the MSCS nodes in the eventlog following Warnings frequently appear:

Event ID: 50 NTFS Warning{delayed write failed} Windows was unable to save all the data for the file. The data has been lost. This error may be caused by a failure of your computer hardware or network connection.Please try to save this file elsewhere

 

 

Event ID: 140 NTFS WarningThe system failed to flush data to the transaction log. Corruption may occur in VolumeId:<> DeviceName: \Device\HarddiskVolume ({device busy} The device is busy at the moment)

 

Event ID: 153 Disk

The IO operation at logical block address “” for Disk “” was retried.

 

The case is also by netapp and vmware but they have no ideas at the moment. The ESX and the netapp environment works fine for other clusters based on W08r2.


Viewing all 6672 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>