Hi :)
Yesterday I've have "failover cluster meltdown :( as I suspect because of network failure and very possible network missconfiguration on nodes.
Error is:
Event ID: 1127
Task Category: Network Manager
Cluster network interface 'node1 - vlan99' for cluster node 'node1' on network 'vlan99' failed. Run the Validate a Configuration...
small net config recap:
os: win svr 2080 R2 sp1 datacenter / 5 nodes
all 5 nodes have two nic teamed (nic2 and nic3) in switch fault tolerance mode and virtual teamed interface is in trunk mode so it have some vlan's:
vlan152-csv - cluster use: internal - allow cluster network communication on this network, not allow clients to connect trough this network
subnet: 10.7.152.0/24
vlan156-lmig - cluster use: internal - allow cluster network communication on this network
subnet: 10.7.156.0/24
vlan77-daex - cluster use disabled: do not allow cluster network communication on this network
subnet: 192.168.168.0/24
vlan99-nodes- cluster use enabled: allow cluster network communication on this network, allow clients to connect trough this network
subnet: 10.0.0.0/16, interface is 1st in binding list order
On subnet 10.0.0.0/16 are also DC's and CAP address have IP 10.0.67.67 from mentioned subnet. vlan99 is accrossed vlan on 3 sites.
Whole failovercluster goes down (blade servers, swtiches and storage are located in site 1) after router in site 2 goes down ? How such thing is possible ? I've expected that blade switch in site 1 should have on itself mentioned vlan99 and that nodes internal
on site 1 could communicate between them using vlan99 because they are on same switch ?
However, from above network config recap you'll see that reconfiguration is needed.
Can I add one more vlan interface, assign IP address and then change CAP IP address in cluster core resources properties for IP address so that I can
change settings for vlan99 interface in cluster and not allow cluster communication on this network ?
Should it new vlan interface be internal only on blade switch like vlan's vlan152 and vlan156 (because it is used only for communication on those 5 nodes on same switch) ?
Can I expect after that change on vlan99 interface some problems with authentication on DC's (they are on same vlan99) ?
Failovercluster purpose is hyper-v servers so I do not have clustered resources which clients need to connect and will also remove allow clients to connect trough this network.
Thank you for any advice
Best regards
Nenad