Quantcast
Channel: High Availability (Clustering) forum
Viewing all 6672 articles
Browse latest View live

Live Migrations of VMs with VHD disks / Long blackout times

$
0
0

Hi all, I'm having some issues with live migrations of virtual machines that have .vhd files in our Windows 2012R2 hyper-v clusters. When I perform a live migration of a virtual machine that has a .vhd disk file, it randomly pauses around 60% for about 90 seconds and then completes the migration. During the time that it pauses at or around 60%, the virtual machine no longer pings. If you look in the Hyper-v logs, it posts a 20417 Event and says that the vm had an unexpectedly long blackout time of 110 seconds.  It's always usually around 110 seconds. The live migration will complete but its causing problems for applications that run on the virtual machines, especially the SQL servers. It took awhile to narrow it down to where I only noticed this on vms with .vhd files. We had over 60 vms on a Windows 2008 R2 cluster that were migrated over to our Windows 2012 R2 hyper-v cluster. That's why we still have .vhd files on some vms.

Anyone else experiencing this problem? I'm looking for a solution to stop this from occurring. I understand that I can convert .vhd files to vhdx files, but I've got a lot of vms that I'd have to do this for.

Any feedback appreciated.

    


The IRPStackSize parameter on Windows 2012

$
0
0

Hello All:

 I need your help.

 Look at this article: http://support.microsoft.com/default.aspx?scid=kb;EN-US;285089

 It's related to a setting that we use on Windows 2008 R2 on Clustered Servers, with value: 20.

 Details: "The IRPStackSize parameter specifies the number of stack locations in I/O request   packets (IRPs) that are used by Windows 2000 Server, by Windows Server 2003, and by Windows XP.   You may have to increase this number for certain transports, for media access   control (MAC) drivers, or for file system drivers. Each stack uses 36 bytes of   memory for each receive buffer. This value is set in the following registry   subkey:

  HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanServer\Parameters

The   default value of the IRPStackSize parameter is 15. The range is from 11 (0xb  hexadecimal) through 50 (0x32 hexadecimal)"

 My question is if this parameter is still valid on Windows 2012 and if there is an official Microsoft article.

 Thanks in advance for your answers and comments.

Regards,


Felipe Román http://feliperoman.wordpress.com

DAG Kerberos Authentication Issue Exchange 2010 on 2008R2 Servers

$
0
0

I have 2 Exchange 2010 servers in a DAG. The witness server is in site A along with one the Exchange servers. The second Exchange server is in a DR site. The DAG has been functioning fine for 1.5 yrs. Last weekend after a scheduled reboot of all 3 servers involved (2 e-mail servers and the witness server), the e-mail server in the DR site cannot gain access to the witness share directory per the failover cluster manager. It says to check to see if the witness directory is on-line, etc... Using pings and explorer, there is no problem for the DR site e-mail server to contact the witness server and directory. Even restablished the Quorem to the same directory, no issues. Upon doing a network trace though, I am receiving KERBEROS pre-authentication errors when you start the Cluster service on the DR site e-mail server when it tries to contact the witness server:

(1.4 is the Witness server; 6.5 is the e-mail server in the DR site)

Source              Destination

192.168.1.4","192.168.6.5","KRB5","319","KRB Error: KRB5KDC_ERR_PREAUTH_REQUIRED"
192.168.6.5","192.168.1.4","TCP","54","26049 > kerberos [FIN, ACK] Seq=235 Ack=266 Win=65792 Len=0"
192.168.6.5","192.168.1.4","TCP","66","26050 > kerberos [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1"
192.168.1.4","192.168.6.5","TCP","60","kerberos > 26049 [ACK] Seq=266 Ack=236 Win=66048 Len=0"
192.168.1.4","192.168.6.5","TCP","60","kerberos > 26049 [RST, ACK] Seq=266 Ack=236 Win=0 Len=0"
192.168.1.4","192.168.6.5","TCP","66","kerberos > 26050 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1406 WS=256 SACK_PERM=1"
192.168.6.5","192.168.1.4","TCP","54","26050 > kerberos [ACK] Seq=1 Ack=1 Win=66048 Len=0"
192.168.6.5","192.168.1.4","KRB5","368","AS-REQ"
192.168.1.4","192.168.6.5","KRB5","282","KRB Error: KRB5KDC_ERR_PREAUTH_FAILED"
192.168.6.5","192.168.1.4","TCP","54","26050 > kerberos [FIN, ACK] Seq=315 Ack=229 Win=65792 Len=0"
192.168.1.4","192.168.6.5","TCP","60","kerberos > 26050 [ACK] Seq=229 Ack=316 Win=66048 Len=0"
192.168.1.4","192.168.6.5","TCP","60","kerberos > 26050 [RST, ACK] Seq=229 Ack=316 Win=0 Len=0"

Thoughts anyone?

SIMPLE QUESTION: HOW TO MIGRATE FROM WINDOWS 2008 R2 + SQL 2012 FAILOVER CLUSTER to WINDOWS SERVER 2012 CLUSTER WITH ALWAYS ON AVAILABILITY GROUP

$
0
0

Hello,

We have 2-node Windows 2008 R2 Enterprise Edition failover cluster with Fibre shared storage (SAN) running SQL Server 2012 SP1. Below is current configuration - very simple and classic, I would say everything by the book:

This is what I think we want to achieve:

Objectives:

1. Upgrade Windows Operating System from Windows Server 2008 R2 to Windows Server 2012

2. Migrate to SQL Server 2012 Always On Availability Group (AAG) for High Availability and Disaster Recovery

My question is how to achieve both goals?

If possible I would like to upgrade OS first. Ideally I would like to upgrade on the same hardware (because it should be minimal impact - no need to migrate data). If this is not possible, we have new hardware I can use also. But I guess it will be more impact and actual data migration will be required.

For AAG what I'm honestly missing is what would be the name of the second SQL server? Lets say my servers called DB1 and DB2, and SQL server called DB. If I create AAG, and fail-over to replica server, would SQL server name be DB as well?

I know there is lots of documentation on AAG and I went through it but I cannot find any specific information about names.

Another question I have - would 3rd server (DB3) be part of the same MSCS cluster? Or it will be separate server? How fail-over exactly works - do I use Fail-over cluster Manager to initiate failover?

Sorry for lots of questions, but any information would be appreciated very much.

Thanks!



Host Unreachable intermittently within a Windows Network Load Balancing Cluster

$
0
0

Hi,

We have 2 Windows 2008 R2 servers running multiple IIS web sites and load balanced across Windows Network Load Balancer in unicast mode. Although there are two interfaces in each server, only 1 interface in each server participates in load balancing and other interface is used for a different backup LAN. The problem I am going to mention was not seen within the NLB for almost 1 year.

I have noticed intermittent "host unreachable" detected from NLB in each host from time to time since 3 weeks ago. After servers are rebooted, both hosts can be reached and can be detected from NLB manager. However it becomes unreachable in both servers within minutes and then becomes reachable again after several minutes. This behavior is noticed in the load balancer and pings do not work between the two hosts when the issue occurs. I did a packet capture to see what was going on with ARP message when the issue occurs. ARP entry goes missing in each server when the problem occurs and no ARP replies are returned from each server. But ARP requests are dispatched from both servers when the issue occurs. ARP replies come back after sometime after which hosts become reachable again.

I tried to create a permanent static ARP entry (By copying the MAC address from ARP table when the two hosts are reachable) in each host but that hasn't solved the issue either. It seems like the individual MAC address generated by each host is a virtual one and it doesn't seem to respond when the problem occurs.

However load balancing and web sites are fully functional without any issues even while "host unreachability" issue is detected.

Appreciate if someone could help me to dig the real problem out.

Thank you.

Creating Cluster over existing infrastructure

$
0
0

Hello

Right now we have two servers: one server running Exchange, shared drives, Active Directory, other software (camera, etc.) and one running our ERP and MSSQL Database.

I just bought a new server. My goal is to make this server the one that will take over if any of the other two would fail and do it automatically and seamlessly. 

Can I do this by creating a new Failover Cluster on the new server, adding the existing services to it and migrating the data and then adding the existing servers to the cluster? If not, how can I do what I need to achieve?

The most important thing is I cannot have any downtime of any of the existing services on the two first servers.

Thank you!


how to convert CentOs running in VMware workstation 9, to .vhd or .vhdx to run it in HyperV 2012 R2

$
0
0

Hi there!
i've a CentOs running in VMware Workstation 9.0, and i want to convert it in .vhd or .vhdx file to run it in Microsoft HyperV 2012 R2.
i've used Disk2Vhd for windows OS, but i don't know what to use for this linux maxhine.

Let me know with your suggestions!

Regards!


Lasandro Lopez

NLB problem

$
0
0

Hi 

I have NLB between two virtual machines , they are hosted via Vcenter

Server A IP : 192.168.1.1

Server B IP: 192.168.1.2

VIP: 192.168.1.3

the mode is unicast and the affinity is single host 

if I ping server A or VIP is working fine , but if I ping Server B IP , it doesn't reply ! 

if server B goes down, it make the VIP goes down once the server is back up! , if Server A goes down , the VIP is not working also

no error in the application or system event log , and I see the event of moving the host to be active when one server is down , but it is not working actually !

plz advise as I don't have much experience in load balancing , the servers are hosting IIS , but the ports to load balanced are all ports 

Windows Server 2012 both

Regards,


Testing Network failure with Windows Failover Cluster on Windows server 2012 R2

$
0
0

Hi,

We have Windows fail over cluster with two Node (Node A & Node B) on Windows Server 2012 R2. Also i install SQL fail over on my cluster with SQL Server 2012 Enterprise. Quorum is disk based.

Node A : 10.1.1.146 (Assigned Vote : 1, Current Vote : 1)

Node B : 10.1.1.147  (Assigned Vote : 1, Current Vote : 1) 

Cluster : 10.1.1.150

SQL Instance : 10.1.1.151

 

We are testing Network failure from Node A (Active) . If we disable network on Node A, Cluster is try to Active Node B but it fails somehow.

If Node B is Active & we disable network on Node B, Fail over happen & Node A became Active. From both node manually fail-over is working fine.

Are we needs to changes something in configuration part ? Are we missing something ?



Ankur Kalavadia


SQL Server 2012 setup failed with erro code 0x85940001 and messsage "Error while enabling Windows feature : NetFx3, Error Code : -2146498298 "

$
0
0

SQL Server 2012 setup failed during installation because of this error with error code "0x85940001" and message

"Error while enabling Windows feature : NetFx3, Error Code : -2146498298 , Please try enabling Windows feature : NetFx3 from Windows management tools and then run setup again. For more information on how to enable Windows features , see http://go.microsoft.com/fwlink/?linkid=227143"

We know this due to one of the feature in windows 2012R2 (NetFx3) was note enabled during the setup time. We were unable to cancel the setup and proceeded until end with following services "Database engine, Reporting services, Analysis services, Client tools, SQL Server Data Tools, Data Quality Services, SQL Server Replication,  Master Data Services" failed during setup.

If its a standalone server I have uninstalled all components restart server and reinstalled from first,

Can some experts help here us, since its a cluster server configuration ! what is the first action needed now and what are the septs to follow to set all right and install the SQL Cluster successsfully with same cluster name and shared drives, Thanks

disk failover works only in one direction winserver 2012R2

$
0
0

I have been banging my head against the wall trying to figure this out...

I have a cluster with WS2012R2

The cluster is on 2 HP DL360 G7 all latest rom drivers etc..

Connection to the disks is via emulex AH403A dual port fiber cards (the servers are identical)

3 disks are presented in the failover cluster manager.

When I try to move the disk from server1 (owner) to server2 it goes to status failed and owner as server2.

The error is :

Cluster resource 'Cluster Disk 1' of type 'Physical Disk' in clustered role '103f5606-e10d-46bd-83b7-2e4e770b5112' failed. The error code was '0x80070490' ('Element not found.').

I try to bring on line and stays failed. I then move it back to server1 and it goes online.

To move it I need to take it off line then move then take it online and this works.

If I then move the disk whose owner is now server2 to server1 it works without any problems (like it should)

I cant figure out why this move only work correctly in one direction.. I have all drivers and roms up to date.

Any help would be appreciated....

Mirrored Storage between two servers

$
0
0

Hello,

Is it possible to create redundant storage using two Windows 2012 r2 servers?  I've been looking all over and any HA options always reference having external shared storage presented to the cluster (I assume NAS/SAN).  I am also assuming that said posts expect that whatever tech is being used for shared storage deals with keeping that data fault tolerant (e.g., mirrored SAN) and the Windows cluster need not be aware of that happening in the background.

Due to lack of any storage hardware in my environment I was hoping there was a way to create a redundant file share/system between two Windows 2012 r2 servers at which point I could then create an iSCSI target to present a virtual drive to an application server cluster.  That way if one of the "storage servers" fails it won't take down the application running on the app server.  I looked into DFS however since the app locks its files those won't replicate across.

I hope my question wasn't too convoluted.

Thanks for any guidance! 


Should one use MPIO and/or CSV in a Windows 2012 R2 guest cluster?

$
0
0

Should one use MPIO and/or CSV in a Windows 2012 R2 guest cluster using VMware ESXi 5.5 presented Fiber LUN RDMs.

If MPIO were implemented is there a preference for HW manufacturer DISM vs. MS DISM in a guest cluster?

What partition size/offset is recommended for the MSR partition (currently set to 1000 MB) - unfortunately seeing storage validation error with failing block write at block 2048 (which in return may be related to VMware ESXi 5.5. disk partition layout)

The current setup works without using MPIO (question is would it help overcome the current failing persistent SCSI-3 reservation warning.)

What were the benefit of using CSV if any in a guest cluster? The Luns in scope would eventually hold SQL data and log files.

 Thanks for your input already.

 

Sassan Karai

Gentoo Linux and Microsoft Failover Clusters / Hyper-V

$
0
0
Hello,

Hoping there are a few people on the boards familiar with running Gentoo Linux guests under Microsoft FailOver Cluster / Hyper-V hosts.

I have four Gentoo Linux guest VMs (running kernel 3.12.21-r1) running under the Microsoft Failover Cluster system with Hyper-V as the host. All of the Hyper-V drivers are built into the kernel (including the utilities and balloon drivers) and generally they run without issue.

For several months now, however, I have been having strange issues with them. Essentially they stop responding to network requests after random intervals. However, these intervals aren't a few minutes or hours from each other; more like days or even weeks before one of them will stop responding on the network side.

The funny thing is that the VMs themselves on the console side still responds. However, if I issue a reboot command on the externally non-responsive VM, the system will eventually get to a stage where all of the services are stopped and then hangs right after the "mounting remaining system ro" line (or something like that).

The Failover Cluster Manager then reports that the system is "Stopping" but the system never reboots.

I have to completely restart the HOST system so that either (A) the VM in question transfers to another host and starts responding again or (B) when the HOST comes back up I can work with the VM again.

This *ONLY* happens on the Gentoo Linux guest VMs and not my Windows VMs.

Wondering if anyone has hints on this.

Thank you for your time.

Regards, Christopher K.

Windows 2008 failover clustering fails with Event ID 1205 1069 1558

$
0
0

I have a two nodes Windows 2008 and SQL 2008 cluster running in active\passive.  I was restarting my nodes after applying windows update then I received the following error message below within Failover Cluster Manager. I get errors with Event ID 1205 1069 1558  this happens every month ..  Could someone help me on to find the root cause for this issue how to check if it is an issue with network or Quorum drive  ?


Is CIFS Share in Netapp Filer supported as File Share Witness?

$
0
0

Hi All,

I'm currently setting up a Windows 2008 R2 Cluster and I'm having a hard time making a CIFS share work as a File Share Witness. Is this a supported configuration?

Thank you!

adlpena

Windows 2012 R2 Hyper-V clustering using "share nothing" strategy

$
0
0

Greetings,

I have been trying to embrace the share nothing strategy Microsoft has put forth with Hyper-V clustering.  By definition, "A shared nothing architecture is a distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. More specifically, none of the nodes share memory or disk storage."  I have plenty of experience with Failover Clustering using iscsi and FC.  But, what I was looking to get out of this lab was a true HV cluster which relied on DAS and move away from iscsi and FC, but also not be dependent on 1 file server which would defeat the SN strategy.

I have spent a week on this lab and gotten always nowhere.  Does anybody know if this is truly possible and point me to a step-by-step or lab which goes start to finish on this topic?  All those which I have read eventually begin using iscsi targets or network shared storage. 

Thanks much in advance.

CAU - ignore WSUS

$
0
0

I just installed new Windows Server 2012R2 cluster with Cluster Aware Updating. I'm testing CAU before going live with this cluser.
CAU shows "No updates", but when I manually scan for updates from Server Manager -> Windows Updates, then it shows lots of updates to install.

Maybe WSUS (used by SCCM) intefears with CAU updates search? Is it possible to ignore WSUS and search updates only from Microsoft Updates (web)?

Connection was interrupted between Failover cluster and shared storage

$
0
0

Hi guys,

My Hyper-V server was composed of 2 HP servers, the OS is win2008, they formed a cluster, and the VM’s storage is shared by NetApp Snapdrive.  

I met an issue yesterday; one of the Failover Cluster nodes was disconnected to the shared storage, it was missing a disk in ‘Server management\Storage\disk management’ and ‘Server management\Storage\Snapdrive\Disks’ pane.The windows system log was shown as below:

Event ID: 1038   Ownership of cluster disk 'Disk G:\' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration.

Event ID: 1069   Cluster resource 'Disk G:\' in clustered service or application 'a3d8511b-6232-44a9-9c47-5e65851e2e09' failed.

Event ID: 61110   ONTAP DSM was unable to communicate with the logical unit on DSM ID 03000102. The DSM will attempt a fail-over.  The data section of this log entry contains the NTSTATUS code.

Event ID: 15   The device, \Device\Harddisk3\DR3, is not ready for access yet.

Event ID: 61034   The multipath logical unit /vol/vol3/qtree2/{15489d81-9dc6-4a97-82fb-10e7a8c40d34}.rws on storage system CN-COQ-Storage2 disconnected.

The cluster events were shown as below:

Event ID:  1038  Ownership of cluster disk 'Disk G:\' has been unexpectedly lost by this node. Run the Validate a Configuration wizard to check your storage configuration.

Event ID:  1069  Cluster resource 'Disk G:\' in clustered service or application 'a3d8511b-6232-44a9-9c47-5e65851e2e09' failed.

Event ID:  1034   Cluster physical disk resource 'Disk G:\' cannot be brought online because the associated disk could not be found. The expected signature of the disk was '{1b20a5bc-bcd0-489a-b84f-935982eaf484}'. If the disk was replaced or restored, in the Failover Cluster Manager snap-in, you can use the Repair function (in the properties sheet for the disk) to repair the new or restored disk. If the disk will not be replaced, delete the associated disk resource.

Event ID:  1205   The Cluster service failed to bring clustered service or application 'a3d8511b-6232-44a9-9c47-5e65851e2e09' completely online or offline. One or more resources may be in a failed state. This may impact the availability of the clustered service or application.

Event ID:   5121    Cluster Shared Volume 'VHDData' ('Disk G:\') is no longer directly accessible from this cluster node. I/O access will be redirected to the storage device over the network through the node that owns the volume. This may result in degraded performance. If redirected access is turned on for this volume, please turn it off. If redirected access is turned off, please troubleshoot this node's connectivity to the storage device and I/O will resume to a healthy state once connectivity to the storage device is reestablished.

I have been contacted with NetApp; they told me the connection was reset by theinitiator, and they suggest me to contact Microsoft; I want to find the root cause of this issue, so please help me. Thanks in advanced.

What is the value of the 'Name' attribute of an Instance of Resource Type IPv6 Adress?

$
0
0

See subject. I'm writing a Powershell script which is supposed to return the configured hostname and IP Address for a service associated with a role on a failover cluster. I'm using, e.g.. the following to retrieve the IPv4 Address:

$a1 = Get-ClusterResource "MyServiceDisplayName"

$b1 = Get-ClusterGroup $a1.OwnerGroup.Name | Get-ClusterResource | Where-Object {$_.ResourceType -eq "IP Address"}

$c1 = Get-ClusterResource $b1.Name | Get-ClusterParameter | Where-Object {$_.Name -eq "Address"}

and then I access the result via $c1.Value.

I want to do the same for an IPv6 address. The information I don't have is the 'Name' attribute of the result of the Get-ClusterParameter command, that is the condition of the  Where-Object in the line which gives me $c1. (for the IPv4 Address I got it by running this on our test server, but that does not have IPv6 configured, so I cannot do the same in that case). Is it simply 'Address' as well?

TIA, Thomas

Viewing all 6672 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>