Troubleshooting Esxi host Disconnection from vCenter issue


Last week I was facing a serious issue in my home lab where my esxi host is getting disconnected from my vCenter Server randomly. Whenever I am doing any configuration changes like enabling ssh or creating a new vSwitch the host got disconnected immediately. I was damn frustrated and was looking for a solution because it was very hard for me to work. So I started troubleshooting by going through my vCenter log files and found following:

2015-03-16T23:06:11.270+05:30 [06304 info ‘vpxdvpxdMoHost’ opID=BADE9DBF-0000007B-b1] [HostMo] host connection state changed to [CONNECTED] for host-35
2015-03-16T23:06:11.273+05:30 [06304 info ‘vpxdvpxdMoHost’ opID=BADE9DBF-0000007B-b1] [HostMo::SetComputeCompatibilityDirty] Marked host-35 as dirty.

2015-03-16T23:02:09.628+05:30 [04380 info ‘vpxdvpxdHostCnx’ opID=SWI-7e4a49e9] [VpxdHostCnx] No heartbeats received from host 5294adb1-584a-2f13-8987-7b52ed31c84b within 120665000 microseconds

2015-03-16T23:02:09.628+05:30 [09928 info ‘vpxdvpxdInvtHostCnx’] [VpxdInvtHost] Got lost connection callback for host-35
2015-03-16T23:02:09.629+05:30 [05548 info ‘commonvpxLro’] [VpxLRO] — BEGIN task-internal-46 — host-35 — VpxdInvtHostSyncHostLRO.Synchronize —
2015-03-16T23:02:09.629+05:30 [05548 warning ‘vpxdvpxdInvtHostCnx’] [VpxdInvtHostSyncHostLRO] Connection not alive for host host-35
2015-03-16T23:02:09.629+05:30 [05548 info ‘vpxdvpxdInvtHostCnx’] [VpxdInvtHost::FixNotRespondingHost] Attempting to fix not responding host host-35
2015-03-16T23:02:10.052+05:30 [05548 info ‘vpxdvpxdHostAccess’] Got VpxaCnxInfo over SOAP version vpxapi.version.version9 for host megatron.alex.local

2015-03-16T23:06:32.368+05:30 [07760 warning ‘Default’] Failed to connect socket; <io_obj p:0x000000000a6fa038, h:3300, <TCP ‘0.0.0.0:0’>, <TCP ‘[::1]:32010’>>, e: system:10061(No connection could be made because the target machine actively refused it)
2015-03-16T23:06:33.369+05:30 [07760 warning ‘Proxy Req 00047’] Connection to localhost:32010 failed with error class Vmacore::SystemException(No connection could be made because the target machine actively refused it).

So I guess something wrong was happening related to heartbeat exchange between my host and vCenter server. I started my troubleshooting by following below steps:

1: Checked whether Esxi is able to reach my vCenter server or not by pinging and doing a telnet from Esxi host to vCenter Server on port 902

Note: Telnet command wont work in Esxi so you have to use “nc -z” command

Res-1

Res-2

So as you can see I  was able to reach my vCenter from my Esxi host successfully.

2: Next I checked whether or not my Esxi host is listening on port 902 (heartbeat port)

Res-3

The above command verified yes my host is listening on port 902

4: I added the host disconnection timeout string in Advance Setttings of vCenter and increased the value to 120

Res-7

I verified once again that value has been added.

Res-8

4: Next I check my vCenter Server for “Managed IP Setting”. Sometimes if the vCenter IP is not listed then also you can face this issue.

Res-5

In my case I manually entered IP under Run Time Settings as shown in above image.

5: I checked the same settings on my Esxi host.

Res-4

So from above image it is pretty clear that my Esxi host is configured to managed by correct vCenter server.

6: Next I checked for Heartbeat Port Value on my Esxi host by running the command:

# grep -i serverport /etc/vmware/vpxa/vpxa.cfg

The output which I got was something strange as my Esxi host was using port 922 for heartbeats exchange instead of using default port 902.

According to VMware KB Article 2040630

This issue is caused by dropped, blocked, or lost heartbeat packets between the vCenter Server and the ESXi/ESX host. If there is an incorrect configuration of the vCenter Server managed IP address, the host receives the heartbeat from vCenter Server but cannot return it.

It is important to remember that the default heartbeat port is UDP 902, and these packets must be sent between vCenter Server and the ESXi/ESX host for the host to stay connected and remain in the vCenter Server inventory.

Res-9

I changed the port to 902 by editing the vpxa.cfg file and removed and added back my Esxi host to vCenter Server and hoped that my issue is now resolved. But surprisingly I was still getting the disconnection problem. Once again I connected my Esxi host using ssh and checked vpxa.cfg file and found the port has been again changed to 922. This was strange.

On digging more I found that this is happening because of heartbeat port specified as 922 in the registry key of vCenter server. I got this clue from one of the issue 2437489 posted in VMware Community group.

The full registry key is :

HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware VirtualCenter

Res-10

As you can see in above image the heartbeat port is 922 which is causing all the troubles. I changed it to 902 and restarted my vCenter Server and bingo my issue is resolved.

Hit Like and share on social media if above information is helpful to you. Happy Learning!!!

About Alex Hunt

Hi All I am Manish Kumar Jha aka Alex Hunt. I am currently working in VMware Software India Pvt Ltd as Operations System Engineer (vCloud Air Operations). I have around 5 Years of IT experience and have exposure on VMware vSphere, vCloud Director, RHEL and modern data center technologies like Cisco UCS and Cisco Nexus 1000v and NSX. If you find any post informational to you please press like and share it across social media and leave your comments if you want to discuss further on any post. Disclaimer: All the information on this website is published in good faith and for general information purpose only. I don’t make any warranties about the completeness, reliability and accuracy of this information. Any action you take upon the information you find on this blog is strictly at your own risk. The Views and opinions published on this blog are my own and not the opinions of my employer or any of the vendors of the product discussed.
This entry was posted in Vmware. Bookmark the permalink.

10 Responses to Troubleshooting Esxi host Disconnection from vCenter issue

  1. Shyfur says:

    It a great article buddy!

    Like

  2. pavan says:

    This is really useful . Keep up the good work

    Like

  3. Sri says:

    Hey Alex, That’s a Good one you’ve found out, So While troubleshooting we’ve to check listening port after your Step-2 and the rest troubleshooting part would have been easier, This was much useful for me!!

    Like

  4. Suthan says:

    Good one!

    Like

  5. Tushar says:

    Keep Shining buddy…really helpful article.

    Like

  6. Sudheer says:

    Super Alex…

    Like

  7. santosh says:

    I faced a same issue.by seeing dis post i resolved the issue.Thanks for the post bhai.Keep going

    Like

  8. Prashant says:

    Good one.

    Like

  9. Mukesh says:

    Good one bro

    Like

  10. jaydip das says:

    helpfull for me

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s