Quantcast
Channel: VMware Communities : All Content - All Communities
Viewing all articles
Browse latest Browse all 182126

ESXi 6 & Long Boot Times

$
0
0

3 Node Cluster Consisting of:

OS: ESXi 6.0.0 2615704 booting from (SD Card)

HW: Dell R820 / 512GB / Xeon E5-4620 4x8

Storage 2x: ISP2432-based 4Gb FC PCI Express HBA

NIC 2x: Broadcom QLogic 57810 10Gb Dual Port

25 LUNs over FC provisioned from a VNX 5600

1 x 2 Node Win 2012 & SQL 2014 WSFC & MSCS - 3 "Phyiscal" RDMs which are mapped on 1 LUN to 2 VMs.

 

Disclaimer - Since upgrading all hosts to 2615704 the ESXi hosts were not updated/patched/restarted

 

  • We have a 3 node cluster identical with the above hardware and software config.
  • For one reason or another I decided to restart one of the nodes (and with this occasion what better time to perform updates?!).
  • 1 node was patched to the latest updates via Update Manager (2809209)
  • During the remediation process I noticed that it was taking far longer than normal to boot - up to 1 hour
  • iDRAC opened it was stuck at a few stages, one of them being nfs41client
  • ALT+F12 displayed the following results (similar to VMware KB:    FCoE storage connections fail when LUNs are presented to the host on different VLANs):
    • T.363Z cpu2:4616)<6>host2: fip: fcoe_ctlr_vlan_request() is done
    • T.365Z cpu0:4606)<6>host2: fip: host2: FIP VLAN ID unavail. Retry VLAN discovery.
    • T.365Z cpu0:4606)<6>host2: fip: fcoe_ctlr_vlan_request() is done
    • T.366Z cpu2:4622)<6>host2: fip: host2: FIP VLAN ID unavail. Retry VLAN discovery.
  • Thinking that it was a new driver issue off I go and apply this fix - Zenfolio | Michael Davis | Broadcom BCM57810 FCoE and ESXi
  • No luck - booting time still takes a long, long time
  • Decide to rebuild the host it as was getting silly in the hours spent troubleshooting
  • Disconnected my FC cables from the storage HBA, re-install ESXi from VMware-VMvisor-Installer-6.0.0-2494585.x86_64-Dell_Customized-A00.iso and boot it up - instantly noticing that startup time was back to normal
  • Proceed to reconnect storage HBA, do some network config and restart - now starting up takes as long as before
  • While waiting for the system to startup I come across this VMware KB:    ESXi/ESX hosts with visibility to RDM LUNs being used by MSCS nodes with RDMs may take a long time to sta… 
  • I proceed to set my LUN which contains the RDM mappings for my 2 MSCS nodes
    • esxcli storage core device setconfig -d naa.xxxx --perennially-reserved=true
    • Verify that it is set with esxcli storage core device list -d naa.xxxx and proceed to reboot
  • Reboot still takes a long time...

 

So I am not sure

  • if --perennially-reserved=true still applies for ESXi 6.0?
    • of course to really confirm if the other 2 nodes in the VMware cluster are experiencing this issue (related to RDMs) I would have to restart at least one of them
  • if it's a driver related thing?
  • if I have missed something else?
  • if I am barking at the wrong tree?

 

I look forward to any comments, questions, ideas, suggestions, etc!

 

 

Thanks

Corin


Viewing all articles
Browse latest Browse all 182126

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>