Quantcast
Channel: High Availability (Clustering) forum
Viewing all articles
Browse latest Browse all 6672

Cluster Network Unavailable on one node

$
0
0

Hi,

We have 5 node cluster. Eache node have 4 NICs. NIC 1 nad 4 are teamed and presented to HyperV , NIC 2 is for CSV and is in separate VLAN  and NIC 3 is for Live Migrate and is also in separate VLAN (cluster name and cluster IP are in the same VLAN as NIC 3 and nodes are communicating with them through this network).

OS: 2008 R2 Datacenter.

Before we started testing it worked perfectly. Validation passed multiple times....

On a switch that nodes are connected we disabled port for NIC 3 on all nodes (automatically cluster name and cluster IP went offline), and then on switch we disabled one port that was mapped to NIC 2 on Node 1 .

As we guessed cluster service on Node 1 went down.

Then we enabled ports for NIC 2 and NIC 3 on Node 1 on switch . Tried to start cluster service on Node 1 but it failed .

At the same time in Failover Cluster Manager networks that represent NIC 2  and NIC 3 on Node 1 went from Failed to Unavailable state.

We enabled all ports on switch that were disabled , started cluster name and cluster IP , in FOC Manager all networks on all nodes beside Node 1 went to up state .

Ping and RDP that we tried to Node 1 through NIC2 and NIC 3 worked. At that time we have noticed that cluster service on Node 1 were crashing.

Cluster node command on Node 1 stated that all other nodes are down and that Node 1 is joining.

Cluster node command on all other nodes were stating that Node 1 is joining and all other nodes are Up.

Tried to start cluster service on Node 1 with /forcecluster and /ips stiches but that didnt solve the problem.

Node 1 reported that there is IP Address Confilict with Cluster IP (I guess that it was trying to take the control of this resource??? which was up at that time)

Then suddenly about 2h after the problems started , NIC 2 and NIC 3 on Node 1 went to Up state in FOC Manager, without any kind intervention from our side.

Does anyone have any kind of idea what happend? Is there somekind of timeout , and after it expired Node 1 tried to communicate again with the cluster resources?

Any help would be appreciated. Thanks in advance


Viewing all articles
Browse latest Browse all 6672

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>