Retry action if failed is not working consistently

dgendera · August 13, 2021, 2:22pm

All,
We’re doing our Patch cycle and having very aggressive schedule to meet 95% in 7 days after Microsoft releases the patches. For majority of machines they download/install/reboot just fine and all is good. But we have subset of machines where we have difficulties to meet that target.

Going to analyze the logs seeing some issues with the retry behavior that’s not consistent and maybe some of the BigFix guru’s in this forum might have idea what could be causing this.

We deploy our patches with following Retry mechanism. If the patch fails retry 5x but wait 1 hour in between these attempts.

Checking 1 client I see the following

SSU Patch is installed and return 0
Start installing the Cumulative Update
while command is running the machine is rebooted

after reboot nothing is happening (regarding Patch action). Checking the patch status in the console for this machine it returns failed. I would expect then after 1-2 hours that the action would be retried on the machine but that’s not happening this is now already 2+ hours like this.

Log Snippet
At 08:36:35 -0400 - actionsite (http://bfixroot.pg.com:29450/cgi-bin/bfgather.exe/actionsite)
Command started - waithidden “C:\WINDOWS\system32\wusa.exe” “C:\Program Files (x86)\BigFix Enterprise\BES Client__BESData\Enterprise Security__Download\ssu-19041.1161-x64_e7e052f5cbe97d708ee5f56a8b575262d02cfaa4.msu” /quiet /norestart (group:435455,action:435459)
Command succeeded (Exit Code=0) waithidden “C:\WINDOWS\system32\wusa.exe” “C:\Program Files (x86)\BigFix Enterprise\BES Client__BESData\Enterprise Security__Download\ssu-19041.1161-x64_e7e052f5cbe97d708ee5f56a8b575262d02cfaa4.msu” /quiet /norestart (group:435455,action:435459)
Fixed - MS21-AUG: Servicing Stack Update for Windows 10 Version 20H2 - Windows 10 Version 20H2 - KB5005260 (x64) (fixlet:500526001)

Start installing CU Patch
At 08:37:18 -0400 - actionsite (http://bfixroot.pg.com:29450/cgi-bin/bfgather.exe/actionsite)
Command started - waithidden “C:\WINDOWS\system32\wusa.exe” “C:\Program Files (x86)\BigFix Enterprise\BES Client__BESData\Enterprise Security__Download\windows10.0-kb5005033-x64_ebab415d7a65f0b33f93e9a30875d74baa8930a7.msu” /quiet /norestart (group:435455,action:435474)

…
At 08:41:37 -0400 -
Starting client version 10.0.2.52
At 08:41:38 -0400 -
Initializing Site: actionsite

After that normal eval cycle occurs, but actual patch action is not being retried !!!
I do see the regular messages for DownloadPing command for other actions, some policy actions are running

At 10:15:44 -0400 -
DownloadPing command received (ID=197028)

what we currently do is create new action on Monday targetting devices where the Patch install might have failed and that way we can increase our overall success rate but this is a bit of overkill and extra work we would like to avoid.

Any idea and/or suggestion what could be reason for this behavior?

SLB · August 13, 2021, 3:13pm

A couple of thoughts.

Does the action report as failed for the endpoint in question? I’m thinking if it isn’t then maybe your retry on failure isn’t being triggered.
Do you have any reapply while relevant setting when you deploy the action? If the action is being interrupted by a reboot after the SSU but while the client is pending download and processing the action, maybe that somehow impedes it being reapplied when no failure is recorded.
How are these being deployed, as a baseline, multigroup action etc etc. I use baselines with 3 components groups. All SSU in the first group, patches in the 2nd group and a “Patch Complete” client setting creation in the last group. Reboots are triggered outside of the patch actions and only become relevant when the patch completion flag is detected which help avoid reboots while other baselines components are being processed.

dgendera · August 13, 2021, 3:42pm

to answer your questions

The action is not explicitly reported as Failed, but checking logs later I see the action aborted. This is while the wusa.exe command is running for Cumulative Patch.
yes we do have that setting enabled.
Correct, we use baseline with similar setup as you have SSU Group | Patch Group. Post action for reboot to happen on the baseline once completed.

To be clear the reboot that occurs is not triggered by the baseline, it looks like it’s triggered outside Bigfix, For one client I suspect it’s closing of the lid and then later re-opened which might explain the action being aborted from what i can see in the client logs. This is happening on 0.2 - 0.5% of our client base.
As explained earlier we do trigger new action on Monday for the failed clients so that will eventually resolve issue on some not all of the clients assuming they are online etc.

Can you say a bit more about “Patch Complete” client setting creation.

Rgds Denis

SLB · August 13, 2021, 4:03pm

Its a basic fixlet I created that will be true if a pending restart exists and the client setting “PatchingComplete” isn’t string value “True” along with some other checks that are specific to our environment. The actionscript simply set the client setting to True

The restart fixlet we have, which also detects the pending retstart, will also check for the client setting equalling True so it is only relevant when pending retstart = True and “PatchingComplete” = True. For our environment where we can’t use post action methods in the baselines, this allows us to manage reboots but so they do not occur while baselines components may still be being processed.