Hello all,
I have been looking into an issue that is happening only in a couple of ESXi hosts part of a cluster.
Any vMotion migration from other hosts to these ones fail at 14%, with the following message:
WARNING: MigrateNet: 1309: 1458908406025172 S: failed to connect to remote host <x.x.x.x> from host <y.y.y.y>: Timeout
WARNING: Migrate: 269: 1458908406025172 S: Failed: The ESX hosts failed to connect over the VMotion network (0xbad010b) @0x41802ba56f9a
I checked the configuration and compare the values with other working ESXi hosts as the following KB article describes:
The MTU, VMkernel settings, LAN settings, route table and so on looks identical to some other hosts working part of the same cluster.
I can even ping successfully the hosts through the vMotion network using the vmk interface configured.
I have been comparing the VMKernel logs performing the migration from different ESXi hosts to identify differences and I spotted the following:
- Between two ESXi hosts where vMotion works correctly:
Migrate: vm 747618: 3286: Setting VMOTION info: Dest ts = AAAAAAAAAAAA, src ip = <x.x.x.x> dest ip = <x.x.x.z> Dest wid = 0 using SHARED swap
SRC and DST IP addresses belong to the same LAN, which (ironically) are not part of the vMotion network at all, but the management one.
- Between two ESXi hosts where vMotion does not work:
Migrate: vm 727726: 3286: Setting VMOTION info: Dest ts = AAAAAAAAAAAA, src ip = <x.x.x.x> dest ip = <y.y.y.y> Dest wid = 0 using SHARED swap
SRC and DST IP addresses belong to the different LANs: SRC is the Management network and DST the vMotion one.
I am running out of ideas, does anyone know why I am seeing these differences?
Any help would be much appreciated.