Partition and Format added (2nd, 3rd,..) Hard Drive in vRealize Automation

 

This is an older one, but cam across this again and it seems to be a common question, so I thought I’d post.

The Problem

When you deploy a VM through vRealize Automation and choose to add a disk or more at deployment time, it will politely ask you for partition letter and name; The deployed machine will even have this disk added if you check in disk management, but since its not partitioned and formatted it will not be shown in Explorer.

There have been workflows written for Orchestrator that would check the properties and if disk1 is configured, check the size, push it to array. Prepare a script, copy it to the guest machine and execute it. It is well worth looking at it as I have re-used parts of it for other provisioning problems. Check out this thread for details: https://communities.vmware.com/thread/488084

There is no need to go to that length, as all code necessary is already contained in the VMware Guest Agent. In our case this is installed and configured on every template machine and I can only assume that this is the case for anyone else so you can run software components.

The (simple!) Solution

The only thing you need to do is to add a custom property to your machine in the Blueprint: VirtualMachine.Admin.UseGuestAgent and set it to true

This will make sure that the Guest Agent is invoked, even if there’s no Software Installation to be done. You get a couple of additional entries in audit log:

7/31/2017 4:14 PM UTC+01:00 Machine Life Cycle Event Machine: vmname: State changed to On.
7/31/2017 4:14 PM UTC+01:00 Machine Life Cycle Event Machine: vmname: State changed to TurningOn.
7/31/2017 4:13 PM UTC+01:00 Machine Life Cycle Event Machine: vmname: State changed to MachineActivated.
7/31/2017 4:13 PM UTC+01:00 Machine Life Cycle Event Machine: vmname: State changed to MachineProvisioned.
7/31/2017 4:12 PM UTC+01:00 Machine Life Cycle Event Machine: vmname: State changed to FinalizeProvisioning.
7/31/2017 4:12 PM UTC+01:00 Machine Life Cycle Event Machine: vmname: State changed to PrepareInstallSoftware.
7/31/2017 4:10 PM UTC+01:00 Machine Life Cycle Event Machine: vmname: State changed to CustomizeOS.
7/31/2017 4:10 PM UTC+01:00 Machine Life Cycle Event Machine: vmname: State changed to InitialPowerOn.
7/31/2017 4:04 PM UTC+01:00 Machine Life Cycle Event Machine: vmname: State changed to CustomizeMachine.
7/31/2017 4:03 PM UTC+01:00 Machine Life Cycle Event Machine: vmname: State changed to CloneMachine.
7/31/2017 4:03 PM UTC+01:00 Machine Life Cycle Event Machine: vmname: State changed to BuildingMachine.
7/31/2017 4:03 PM UTC+01:00 Machine Life Cycle Event Machine: vmname: State changed to WaitingToBuild.
7/31/2017 4:03 PM UTC+01:00 Machine Life Cycle Event Machine: vmname: State changed to Requested.

Even better, this is running before machineprovisioned (at least in 7.3 where I just checked) — so you can use this disk already in VRO workflows during machineprovisioned

added_disk1

Adding disk at request time

added_disk2

… and the added, partitioned and formatted disk

Additional info

The scripts to partition the drives can be found in Guest Agent folder.

C:\VRMGuestAgent\site\Partition
Advertisements
Posted in Uncategorized | 3 Comments

Make sure your templates only have 1 CPU

My colleague was doing some troubleshooting around deployment times in vRealize Automation. We figured that all VMs we deploy with single CPU take about 5 minutes longer to deploy than 2-cpu or higher.

Our templates are typically 2-CPU. When we deploy 2 or more CPU the customization simply adds if neccessary. However in most OS, if you remove CPU count, you need to reboot the VM. This was adding to our deployment times.

Customization typically takes 5 minutes in our infra, but over 10 minutes when the deployed machines has less CPU than template.

customization_delay

To solve this issue simply make sure you only use single CPU templates. There is nothing to be gained from having the templates 2 or more CPU… also vROPs will complain about the waste!

 

Posted in Uncategorized | Leave a comment

VRA 7.x upgrade log files and troubleshooting

I had many issues upgrading 7.x, currently I’m upgrading production to 7.3, here’s a reference of all the log file locations in relation to an upgrade.

In a distributed environment the upgrade order is as such

  1. download installer data (vami.log)
  2. prechecking (updatecli.log)
  3. upgrade management agents (updatecli.log)
  4. stop all infra services and other prep-tasks (updatecli.log)
  5. patch VA1 (updatecli.log)
  6. patch VA2 (updatecli.log)
  7. Reboot (manual step, user needs to do this). Injects reboot to VA1 (Upgrade.log from here onwards)
  8. once VA1 is powering on it tries to inject reboot to VA2. If VA2 is down at this stage, VA1 will hang here forever! you will see console screen on VA1 only, no ssh, last thing it did successfully was starting postgres i think
  9. wait for all services to come back up on VA1, this confirms successful update of VA1. It waits a maximum of 45 minutes
  10. if all is good, start upgrading infra

Note: if VA upgrade was not successful, but you managed to fix it, you can still run the full infra upgrade by single command line in VA, or one server after another manually (like it was done in 7.0-7.1)

In detail:

vami.log

When you kick off the upgrade in VAMI, do a tail on the appliance you kicked it off:

# tail -f /opt/vmware/var/log/vami/vami.log

This will log:

download installer files (from web or from CD)

13/07/2017 10:38:57 [INFO] Downloaded file. url=/package-pool/freetype2-2.3.7-25.44.1.x86_64.rpm
13/07/2017 10:38:57 [INFO] Downloaded file. url=/package-pool/gdbm-1.8.3-374.25.x86_64.rpm
13/07/2017 10:38:57 [INFO] Downloaded file. url=/package-pool/gfxboot-4.1.34-0.5.44.x86_64.rpm
13/07/2017 10:38:57 [INFO] Downloaded file. url=/package-pool/gfxboot-branding-SLES-4.1.34-0.5.44.x86_64.rpm
13/07/2017 10:38:57 [INFO] Downloaded file. url=/package-pool/glib2-2.22.5-0.8.26.1.x86_64.rpm ...

Then figure out what is new or updated

13/07/2017 10:41:31 [INFO] Creating install script files for updatecli
13/07/2017 10:41:32 [INFO] Using update pre-install script
13/07/2017 10:41:32 [INFO] Update version 7.3.0.536. Installing the following packages:
13/07/2017 10:41:32 [INFO] package UPDATE VERSION: VMware-Log-Insight-Agent noarch (none) 4.4.0 5314476 /package-pool/VMware-Log-Insight-Agent-4.4.0-5314476.noarch.rpm rpm 8920561 f93576c5471845a3a9f85435466d37c720c7aa91
13/07/2017 10:41:32 [INFO] package UPDATE VERSION: VMware-Postgres x86_64 (none) 9.5.6.0 5262417 /package-pool/VMware-Postgres-9.5.6.0-5262417.x86_64.rpm rpm 2033529 65685b2f03bc89c34e403596c70f4c97e108969d
13/07/2017 10:41:32 [INFO] package UPDATE VERSION: VMware-Postgres-contrib x86_64 (none) 9.5.6.0 5262417 /package-pool/VMware-Postgres-contrib-9.5.6.0-5262417.x86_64.rpm rpm 524286 2c35f976e809b6c02e270add682f9bf91dd05d97
13/07/2017 10:41:32 [INFO] package UPDATE VERSION: VMware-Postgres-extras x86_64 (none) 9.5.6.0 5262417 /package-pool/VMware-Postgres-extras-9.5.6.0-5262417.x86_64.rpm rpm 551339 ea72ee9901fe37924868855ac56cf2e544e7a01e
13/07/2017 10:41:32 [INFO] package UPDATE VERSION: VMware-Postgres-extras-init x86_64 (none) 9.5.6.0 5262417 /package-pool/VMware-Postgres-extras-init-9.5.6.0-5262417.x86_64.rpm rpm 5106 83911d033bc51a2e17be8ac6db97051d4b7dbc42
13/07/2017 10:41:32 [INFO] package NEW PACKAGE : VMware-Postgres-extras-systemd x86_64 (none) 9.5.6.0 5262417 /package-pool/VMware-Postgres-extras-systemd-9.5.6.0-5262417.x86_64.rpm rpm 3347 27e15ba57287bd30f1f761b4c27fd8d58e6ca092
13/07/2017 10:41:32 [INFO] package NEW PACKAGE : VMware-Postgres-extras-sysv x86_64 (none) 9.5.6.0 5262417 /package-pool/VMware-Postgres-extras-sysv-9.5.6.0-5262417.x86_64.rpm rpm 4938 ceb935f772b5d8c8e63d4299992adb6076fa06ea

once all this is done it will finish with:

13/07/2017 10:41:32 [INFO] Using update post-install script
13/07/2017 10:41:32 [INFO] Running updatecli to install updates. command={ /opt/vmware/share/vami/update/updatecli '/opt/vmware/var/lib/vami/update/data/job/10' '7.1.0.710' '7.3.0.536' ; /opt/vmware/bin/vamicli version --appliance ; } >> /opt/vmware/var/log/vami/updatecli.log 2>&1 &
13/07/2017 10:41:33 [INFO] Installation running in the background

if using CD upgrade this will take around 3 minutes total

 

updatecli.log

Then it invokes updatecli, which you can monitor as such:

 # tail -f /opt/vmware/var/log/vami/updatecli.log

this log formats as such

++ date '+%Y-%m-%d %H:%M:%S'
+ echo '2017-07-13 10:41:55 /etc/bootstrap/preupdate.d/00-00-01-va-resources-check done, succeeded.'
+ for script in '"${bootstrap_dir}"/*'
+ echo
+ '[' '!' -e /etc/bootstrap/preupdate.d/00-00-02-check-replica-availability ']'
+ '[' '!' -x /etc/bootstrap/preupdate.d/00-00-02-check-replica-availability ']'
+ log '/etc/bootstrap/preupdate.d/00-00-02-check-replica-availability starting...'
++ date '+%Y-%m-%d %H:%M:%S'
+ echo '2017-07-13 10:41:55 /etc/bootstrap/preupdate.d/00-00-02-check-replica-availability starting...'
+ /etc/bootstrap/preupdate.d/00-00-02-check-replica-availability 7.1.0.710 7.3.0.536
+ log '/etc/bootstrap/preupdate.d/00-00-02-check-replica-availability done, succeeded.'
++ date '+%Y-%m-%d %H:%M:%S'
+ echo '2017-07-13 10:42:19 /etc/bootstrap/preupdate.d/00-00-02-check-replica-availability done, succeeded.'

Take note above, it tells you the time it starts a script, then logs exactly how its executed including all parameters (/etc/bootstrap/preupdate.d/00-00-02-check-replica-availability 7.1.0.710 7.3.0.536) and finally what time it finished

If one of those scripts fails, the first thing I do is to run it manually, exactly as specified as this way it logs to stdout. In this example, if it had failed, I would run including the parameters

/etc/bootstrap/preupdate.d/00-00-02-check-replica-availability 7.1.0.710 7.3.0.536

this will display all steps and where it fails. Once you fix your problem, run the script again until it succeeds. then re-try the upgrade again. In the past I revered, but in hindsight I don’t think this is needed.

 

The main events you see in this updatecli.log file:

  • pre-checking, upgrade management agents , stopping IaaS Services, … ~40 minutes
28/06/2017 10:00:35 [INFO] Update status: Done pre-install scripts
  • patching VA1 (you see all steps) ~20 minutes
  • starting to patch VA2 if distributed
echo '2017-06-28 10:21:31 /etc/bootstrap/postupdate.d/995-upgrade-replicas starting...'
  • Note: go to VA2 and tail -f /opt/vmware/var/log/vami/updatecli.log to see the details. This takes around 20-30 mins as well

Then there is a 10-15 minute delay from when VA2 is done according to this log file until it proceeds with more steps. It will finally finish on VA2 and later on VA1 with:

28/06/2017 10:45:14 [INFO] Update status: Update completed successfully
28/06/2017 10:45:14 [INFO] Install Finished

 

At this stage you should see in VAMI that VAs are upgraded and it you should now reboot. You can confirm this by checking upgrade.log

cat /usr/lib/vcac/tools/upgrade/upgrade.log

what you see in this log is exactly what is displayed in VAMI!

Reboot

As mentioned VA1 is rebooted manually by you and once you can see in VMconsole that postgres was started it already sends command to VA2 to reboot. If VA2 is offline at this stage, VA1 will hang here. it will not carry on unless it can tell someone to reboot …

I typically SSH to VA and wait for the console alert:

The system is going down for reboot NOW!

then I know VA1 injected it to VA2 and it should be able to SSH back into VA1 very soon

Once VA1 started it will start up services (vcac, vco, rabbitmq …)

upgrade.log will only state “waiting all services to start”

UTC: 2017-06-26 13:28:55.004731; Local: 2017-06-26 13:28:55.004753] [INFO]: Waiting all services to start…

it will wait for a maximum of 45 minutes. If services don’t come up in this time, then upgrade will stop at this stage.

If you can fix whats wrong (get all services up), then you can re-initiate this 2nd phase (inf upgrade) by:

/usr/lib/vcac/tools/upgrade/upgrade &

Infra upgrade

keep monitoring upgrade.log

tail -f /usr/lib/vcac/tools/upgrade/upgrade.log

as mentioned before the content from this file is what is displayed in VAMI!! Unfortunately I don’t have any info on upgrade logs on infra.

If you do the inf upgrade manually, you can monitor the logs in c:\program files (x86)\vmware\… depending on what is upgraded. if it’s running from VA they dont seem to be populated. I might be wrong …

Posted in Uncategorized | Leave a comment

org.bouncycastle.crypto.InvalidCipherTextException: pad block corrupted

Hello world …

my first post 🙂

I’m using vRealize Automation in a productive system and pretty much every day I come across something that don’t work as advertised, things I figure out after spending hours of troubleshooting … essentially its going to be my personal KB online …

the latest problem I came across. Upgraded from VRA 7.1 to 7.3 and some deployments end up with an error

org.bouncycastle.crypto.InvalidCipherTextException: pad block corrupted

What gave it away was that some blueprints deploy and some don’t. Those that don’t had software components configured. Removing all of them would allow me to deploy. Adding any of them would fail.

Then when I saw ‘CIPHER’ in the error message… I finally figured it out. Its the exact same problem you have when you change the cert in the vCenter. When deploying a machine, vCenter cannot use the new certificate to decrypt the passwords used for the passwords in customization script. In this case one of the certs used in VRA changed (I did not, maybe the updater did?). I re-entered all the passwords used (encrypted string), saved the Blueprint and all deploys fine

Essentially to solve the problem:

Re-enter all passwords (encrypted strings) used in the blueprint. Make sure to check all properties and software components used in the specific blueprint

 

 

 

Posted in Uncategorized | Leave a comment