I am having an intermittent problem with the TEAMed NICs losing their network connectivity.
But first, here’s some details about my environment. I have a four node Hyper-V Cluster running Windows Server 2012 R2. The servers are Dell R720 with two Broadcom Quad 1Gig Ports (B5720 and B5719). This gives each one of my servers a total
of eight 1Gb ports. With this, I’ve setup 2 network Teams. Here’s how I’ve got my 8 NICs setup:
- I’ve created a TEAM with Port1 on the B5720 and Port1 on the B5719. These two TEAMed NICs are plugged into Cisco Switch ports that have been assigned to my “SERVER Network” (the VLAN where all my servers can communicate with each other). This TEAM is used for regular server to server communications and access to AD. We will call this team “SERVER_TEAM”
- I’ve also created a second TEAM with Port2 on the B5720 and Port2 on the B5719. These two TEAMed NICs are plugged into Cisco switch ports that are configured as trunks, with various VLANs tied to them. I’ve then created a Virtual Switch in Hyper-V
using this TEAM. We will call this team “vSWITCH_TEAM”
- The remaining 4 ports have been left as individual NICs. One for LiveMigration, one for internal cluster communication, and two for SMB traffic.
Both teams have been configured as follows:
- Teaming mode = Switch Independent
- Load balancing mode = Dynamic
- Standby adapter = None (all adapters Active)
Problem Description
Every once in a while, the VMs on one node, will all simultaneously lose their network connection. And right away, the phone starts to ring off the hook, as our users can no longer access the services supplied by the affected VMs. LiveMigrating
the VMs to another host, will restore the VMs network connections.
I’ve tried moving a non-critical VM back to the problem host, and as expected, that VM lost network connection. If I reboot the host, then everything is fine again, and I can move VMs back onto that host and they continue to talk to the network perfectly.
I’ve also found out, that instead of rebooting the host, I could unplug the network cables being used by the virtual switch team, and then plug them back in, and that also fixes the problem. This is a reoccurring issue that has occurred on more than
one of my hosts, therefore, it’s not a hardware problem with one of the servers.
When the problem is occurring, our “Cisco Guy” says that he sees a whole lot of ‘dropped packets’ on one of the interfaces in the team.
Does anyone have any ideas or suggestions? These Hyper-V hosts are bran new, fully patched, with the latest Broadcom NIC drivers installed.