I had many issues upgrading 7.x, currently I’m upgrading production to 7.3, here’s a reference of all the log file locations in relation to an upgrade.
In a distributed environment the upgrade order is as such
- download installer data (vami.log)
- prechecking (updatecli.log)
- upgrade management agents (updatecli.log)
- stop all infra services and other prep-tasks (updatecli.log)
- patch VA1 (updatecli.log)
- patch VA2 (updatecli.log)
- Reboot (manual step, user needs to do this). Injects reboot to VA1 (Upgrade.log from here onwards)
- once VA1 is powering on it tries to inject reboot to VA2. If VA2 is down at this stage, VA1 will hang here forever! you will see console screen on VA1 only, no ssh, last thing it did successfully was starting postgres i think
- wait for all services to come back up on VA1, this confirms successful update of VA1. It waits a maximum of 45 minutes
- if all is good, start upgrading infra
Note: if VA upgrade was not successful, but you managed to fix it, you can still run the full infra upgrade by single command line in VA, or one server after another manually (like it was done in 7.0-7.1)
In detail:
vami.log
When you kick off the upgrade in VAMI, do a tail on the appliance you kicked it off:
# tail -f /opt/vmware/var/log/vami/vami.log
This will log:
download installer files (from web or from CD)
13/07/2017 10:38:57 [INFO] Downloaded file. url=/package-pool/freetype2-2.3.7-25.44.1.x86_64.rpm
13/07/2017 10:38:57 [INFO] Downloaded file. url=/package-pool/gdbm-1.8.3-374.25.x86_64.rpm
13/07/2017 10:38:57 [INFO] Downloaded file. url=/package-pool/gfxboot-4.1.34-0.5.44.x86_64.rpm
13/07/2017 10:38:57 [INFO] Downloaded file. url=/package-pool/gfxboot-branding-SLES-4.1.34-0.5.44.x86_64.rpm
13/07/2017 10:38:57 [INFO] Downloaded file. url=/package-pool/glib2-2.22.5-0.8.26.1.x86_64.rpm ...
Then figure out what is new or updated
13/07/2017 10:41:31 [INFO] Creating install script files for updatecli
13/07/2017 10:41:32 [INFO] Using update pre-install script
13/07/2017 10:41:32 [INFO] Update version 7.3.0.536. Installing the following packages:
13/07/2017 10:41:32 [INFO] package UPDATE VERSION: VMware-Log-Insight-Agent noarch (none) 4.4.0 5314476 /package-pool/VMware-Log-Insight-Agent-4.4.0-5314476.noarch.rpm rpm 8920561 f93576c5471845a3a9f85435466d37c720c7aa91
13/07/2017 10:41:32 [INFO] package UPDATE VERSION: VMware-Postgres x86_64 (none) 9.5.6.0 5262417 /package-pool/VMware-Postgres-9.5.6.0-5262417.x86_64.rpm rpm 2033529 65685b2f03bc89c34e403596c70f4c97e108969d
13/07/2017 10:41:32 [INFO] package UPDATE VERSION: VMware-Postgres-contrib x86_64 (none) 9.5.6.0 5262417 /package-pool/VMware-Postgres-contrib-9.5.6.0-5262417.x86_64.rpm rpm 524286 2c35f976e809b6c02e270add682f9bf91dd05d97
13/07/2017 10:41:32 [INFO] package UPDATE VERSION: VMware-Postgres-extras x86_64 (none) 9.5.6.0 5262417 /package-pool/VMware-Postgres-extras-9.5.6.0-5262417.x86_64.rpm rpm 551339 ea72ee9901fe37924868855ac56cf2e544e7a01e
13/07/2017 10:41:32 [INFO] package UPDATE VERSION: VMware-Postgres-extras-init x86_64 (none) 9.5.6.0 5262417 /package-pool/VMware-Postgres-extras-init-9.5.6.0-5262417.x86_64.rpm rpm 5106 83911d033bc51a2e17be8ac6db97051d4b7dbc42
13/07/2017 10:41:32 [INFO] package NEW PACKAGE : VMware-Postgres-extras-systemd x86_64 (none) 9.5.6.0 5262417 /package-pool/VMware-Postgres-extras-systemd-9.5.6.0-5262417.x86_64.rpm rpm 3347 27e15ba57287bd30f1f761b4c27fd8d58e6ca092
13/07/2017 10:41:32 [INFO] package NEW PACKAGE : VMware-Postgres-extras-sysv x86_64 (none) 9.5.6.0 5262417 /package-pool/VMware-Postgres-extras-sysv-9.5.6.0-5262417.x86_64.rpm rpm 4938 ceb935f772b5d8c8e63d4299992adb6076fa06ea
once all this is done it will finish with:
13/07/2017 10:41:32 [INFO] Using update post-install script
13/07/2017 10:41:32 [INFO] Running updatecli to install updates. command={ /opt/vmware/share/vami/update/updatecli '/opt/vmware/var/lib/vami/update/data/job/10' '7.1.0.710' '7.3.0.536' ; /opt/vmware/bin/vamicli version --appliance ; } >> /opt/vmware/var/log/vami/updatecli.log 2>&1 &
13/07/2017 10:41:33 [INFO] Installation running in the background
if using CD upgrade this will take around 3 minutes total
updatecli.log
Then it invokes updatecli, which you can monitor as such:
# tail -f /opt/vmware/var/log/vami/updatecli.log
this log formats as such
++ date '+%Y-%m-%d %H:%M:%S'
+ echo '2017-07-13 10:41:55 /etc/bootstrap/preupdate.d/00-00-01-va-resources-check done, succeeded.'
+ for script in '"${bootstrap_dir}"/*'
+ echo
+ '[' '!' -e /etc/bootstrap/preupdate.d/00-00-02-check-replica-availability ']'
+ '[' '!' -x /etc/bootstrap/preupdate.d/00-00-02-check-replica-availability ']'
+ log '/etc/bootstrap/preupdate.d/00-00-02-check-replica-availability starting...'
++ date '+%Y-%m-%d %H:%M:%S'
+ echo '2017-07-13 10:41:55 /etc/bootstrap/preupdate.d/00-00-02-check-replica-availability starting...'
+ /etc/bootstrap/preupdate.d/00-00-02-check-replica-availability 7.1.0.710 7.3.0.536
+ log '/etc/bootstrap/preupdate.d/00-00-02-check-replica-availability done, succeeded.'
++ date '+%Y-%m-%d %H:%M:%S'
+ echo '2017-07-13 10:42:19 /etc/bootstrap/preupdate.d/00-00-02-check-replica-availability done, succeeded.'
Take note above, it tells you the time it starts a script, then logs exactly how its executed including all parameters (/etc/bootstrap/preupdate.d/00-00-02-check-replica-availability 7.1.0.710 7.3.0.536) and finally what time it finished
If one of those scripts fails, the first thing I do is to run it manually, exactly as specified as this way it logs to stdout. In this example, if it had failed, I would run including the parameters
/etc/bootstrap/preupdate.d/00-00-02-check-replica-availability 7.1.0.710 7.3.0.536
this will display all steps and where it fails. Once you fix your problem, run the script again until it succeeds. then re-try the upgrade again. In the past I revered, but in hindsight I don’t think this is needed.
The main events you see in this updatecli.log file:
- pre-checking, upgrade management agents , stopping IaaS Services, … ~40 minutes
28/06/2017 10:00:35 [INFO] Update status: Done pre-install scripts
- patching VA1 (you see all steps) ~20 minutes
- starting to patch VA2 if distributed
echo '2017-06-28 10:21:31 /etc/bootstrap/postupdate.d/995-upgrade-replicas starting...'
- Note: go to VA2 and tail -f /opt/vmware/var/log/vami/updatecli.log to see the details. This takes around 20-30 mins as well
Then there is a 10-15 minute delay from when VA2 is done according to this log file until it proceeds with more steps. It will finally finish on VA2 and later on VA1 with:
28/06/2017 10:45:14 [INFO] Update status: Update completed successfully
28/06/2017 10:45:14 [INFO] Install Finished
At this stage you should see in VAMI that VAs are upgraded and it you should now reboot. You can confirm this by checking upgrade.log
cat /usr/lib/vcac/tools/upgrade/upgrade.log
what you see in this log is exactly what is displayed in VAMI!
Reboot
As mentioned VA1 is rebooted manually by you and once you can see in VMconsole that postgres was started it already sends command to VA2 to reboot. If VA2 is offline at this stage, VA1 will hang here. it will not carry on unless it can tell someone to reboot …
I typically SSH to VA and wait for the console alert:
The system is going down for reboot NOW!
then I know VA1 injected it to VA2 and it should be able to SSH back into VA1 very soon
Once VA1 started it will start up services (vcac, vco, rabbitmq …)
upgrade.log will only state “waiting all services to start”
UTC: 2017-06-26 13:28:55.004731; Local: 2017-06-26 13:28:55.004753] [INFO]: Waiting all services to start…
it will wait for a maximum of 45 minutes. If services don’t come up in this time, then upgrade will stop at this stage.
If you can fix whats wrong (get all services up), then you can re-initiate this 2nd phase (inf upgrade) by:
/usr/lib/vcac/tools/upgrade/upgrade &
Infra upgrade
keep monitoring upgrade.log
tail -f /usr/lib/vcac/tools/upgrade/upgrade.log
as mentioned before the content from this file is what is displayed in VAMI!! Unfortunately I don’t have any info on upgrade logs on infra.
If you do the inf upgrade manually, you can monitor the logs in c:\program files (x86)\vmware\… depending on what is upgraded. if it’s running from VA they dont seem to be populated. I might be wrong …