Monthly patching is failing and pending restart on multiple machines

Good evening, I am having this issue constantly. So I push monthly patches every month to the machines in our company. for some reason the same multiple machines fail or get stuck on pending restart every month. All the patches get pushed but the cumulative update patches for windows always fails. if you need more information, I can provide

If it’s the same machines giving problems every month, I’d check whether the Windows installation is corrupted.

The first step would be to install one of the failing patches, manually, interactively, and observe whether it gives useful error messages (I expect another failure, but possibly a failure with a useful message).

You can also check \windows\logs\CBS\cbs.log for error messages from the Component-Based Servicing process, which is usually responsible for the patch installations.

You can also try using the DISM tool to repair your Windows installation as described at Fix Windows Update corruptions and installation failures - Windows Server | Microsoft Learn

If you find that is successful in allowing one of your systems to patch, I can get you a Fixlet to run the repair on the remaining systems and you can try patching them again.

2 Likes

Ok let me look into this and test out some of your methods

I can tell you from experience certain month’s patching from Microsoft are messed up and they do require multiple reboots (I had it manually tested: you apply the cumulative patch, rebooted manually and server went down but when it came back up it was still in pending reboot state). I would consider it as a bug because it is completely undocumented by Microsoft…

The only problem with BigFix is that the “action pending restart” actionscript command is not smart enough to realize that server did go down and it is relying on OS’ “pending reboot” state to clear. I did take this via Support to developers a few years ago to try to fix it but what I was told is that they only go by Microsoft documentation and since the above is undocumented by MS, they are not willing to adjust their code to what would be a “bug” which I thought was fair…

For what it is worth, we do have “reboot policies” at specified windows it would reboot boxes if they are “pending restart” and the problem we were facing is that because the OS status didn’t change after the OS was rebooted the status of the job never changed and it never reapplies that policy again even though it is required… We overcame it by creating a secondary reboot policy that kicks in towards the end of the reboot window IF the box is still in pending restart state and that second job would clear the entire state/issue.

I second @JasonWalker remark about OS corruption. So often se see patch installation failures via Bigfix that also fail via manual method and even windows Update itself and are often linked to CBS issues. It is possible, though should be used cautiously, to inspect the setup log via a propoerty to gain visibility of setup event log data, which can avoid the need to manually inspect event logs for each machine. I say use cautiously as depending on the size of your event logs, the inspection can take some time to complete and can, in some cases, exceed the max time for a property to evaluate (I recall once this was a hard limit of 10 seconds but that was many many versions ago in the Bigfix 7.x era). Creating an analysis and using client setting can help minimize what endpoint will process this typer of inspection so at least you can manage it relatively granularly, e.g. only endpoint with client setting “EvalSetupLog” = “True” then use a task/fixlet to create/remove that setting on endpoint you want to gain visibility of. As a property, to pull events form the last 10 days, the one I use is below and I set it to a minimum of a daily refresh interval to minimize impact to the endpoints.

((((month of it as two digits) & "/" & (day_of_month of it as two digits) & "/" & year of it as string) of date ("GMT" as time zone) of it & " " & ((two digit hour of it) & ":" & (two digit minute of it)) of time ("GMT" as time zone) of it) of (time generated of it), descriptions of it, source of it, event id of it) of records whose (time generated of it > (now - 10 * day)) of event log "setup"

@ageorgiev, I know the Bigfix PendingRestart with the sha1 of patch flag used by fixlets, which one would expect to be removed after a restart, gets impeded if/when another restart flag still exists after a restart, which I think there are something like 14 different reboot indicators for Windows. I’ve seen many a time that the Windows Update reboot flag is cleared but a software or driver update may still need a restart so its PendingFileRenameOperation prevents the Bigfix client from clearing its PendingRestart flag and then needs more restarts to clear but during that time patch fixlets to not become relevant again and messes up the fixlet applicability visibly resulting in a “security by obscurity” situation. I have a property that checks the Windows Update pending restart flag “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\WindowsUpdate\Auto Update\RebootRequired” which gives me a slightly better awareness of when a CU is pending a restart or maybe misrepresenting patch applicibility.

@SLB and @ageorgiev, does the setting _BESClient_ActionManager_PendingRestartExclusions = “:;” help with this situation?

Potentially, if you know what one of the 14 restart indicators is impeding things to add an exclusion for. Although a CU may set a WUA reboot state, other updates may set PendingFileRenameOperations and as that flag ios also used for drivers and software, it could impact other activities if its excluded. TBH I haven’t investigated it too much as it doesn’t cause us too much of an issue as we have other tools and processes in place that notify users of restart states

The “stuck” pending restart that I have observed with those “buggy” monthly cumulative patches is not BigFix-triggered. They are OS-related and usually if you check the PendingFileRenameOperations key after the first reboot you will see some kind print*dll but again all of this is on OS side and I was able to reproduce it completely outside of BigFix. I am not saying that it happens all the time and that even when it does, that it happens on every single machine (it may be specific to certain type of machines with certain type of drivers; etc) but I have seen it, so thought it is worth mentioning.

@itsmpro92 , you could but that will require you to spend enough time in every occasion to document every single value in PendingFileRenameOperations that is known to occur and then populating this setting on every single machine, and hoping that in the future there are no new values that occur… You dedicate enough time to this problem with that approach I guess you can do it… I never went that further past the statement from support and as I said, went to create second reboot policy job to just work around it…

1 Like

I should note that the proposed value in my previous post for the restart exclusions is a wildcard.

With the setting _BESClient_ActionManager_PendingRestartExclusions=:; all entries in the Microsoft registry key HKLM\System\CurrentControlSet\Control\Session ManagerPendingFileRenameOperations are ignored because every entry is a path containing always a colon.

1 Like

Hey so I manually installed one of the patches and it stated " you must restart your computer for the updates to take effect" I restarted and went to check if the update was successfully using PowerShell ‘Get-hotfix’. It is not showing up on there. and doesnt show on add or remove programs

I looked at the windows log
[SR] Verify complete
[SR] Verifying 100 components
[SR] Beginning Verify and Repair transaction
@2024/7/2:18:13:29.727 Primitive installers committed for repair
@2024/7/2:18:13:29.742 Primitive installers committed for repair
@2024/7/2:18:13:29.758 Primitive installers committed for repair
@2024/7/2:18:13:29.774 Primitive installers committed for repair
@2024/7/2:18:13:29.789 Primitive installers committed for repair

Oh ok, well, in my case I don’t think you would really want to ignore just everything blindly because in this situation you do want to know that the machine is pending a reboot because it is unfortunately a valid state the OS is in, and it does require another reboot after which it does solve the problem (it’s not a permanent problem). It’s how to get BigFix agent to realize that a reboot has happened and release actions (force them to check success criterias and allow them to apply ones with “reapply” behaviour).

1 Like

Here are a few simple but effective things that have got me past the failures in many cases. On the machines where the cumulative update is failing, try these few things.

Delete the contents of c:\windows\temp

Delete all unknown user profiles

Ensure you have at least 10GB free disk space

From an elevated command prompt run the following commands.

sfc /scannow

Dism.exe /online /Cleanup-Image /checkhealth

Dism.exe /online /Cleanup-Image /scanhealth

Dism.exe /online /Cleanup-Image /Restorehealth

Dism.exe /Online /Cleanup-Image /AnalyzeComponentStore

Dism.exe /Online /Cleanup-Image /StartComponentCleanup

If any of those fails, you’re OS is probably in bad shape. Reboot.

If it still fails after the reboot, search the CBS log for the following.

“Failed to resolve package”, find the KB next to it. Download and install that KB it’s complaining about. You might see something like <Failed_Package> or <Missing_Package>

If all that fails, there’s one last thing you can try which is searching the CBS log for the failing Hotfix, then going into the registry to remove all references to the update assembly that is missing from HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Component Based Servicing\Packages registry key.

1 Like

HCL has also provided the following Task in Patching Support to retrieve useful information related to failing patches on Windows:

Task 12010: Collect Patch Diagnostics data, Registry key from Windows Endpoints

Action 1: This task is used to collect below mentioned patch diagnostics data from windows endpoints.

  • List of installed patches
  • Windows Update log (For Windows 10 & 11)
  • CBS logs
  • Events from ‘System’ for past 48 hours
  • BigFix registry keys
  • The results will be uploaded to the BigFix Server under below path: \UploadManagerData\BufferDir\sha1

Action 2: This task is used to collect specific registry from Windows endpoints.

Specify the full path of the subkey. The key name must include a valid root key. Valid root keys are: HKLM, HKCU, HKCR, HKU, and HKCC. Example: KEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall

Note: If the registry key name contains a space, enclose the full path name in quotes.

The results will be uploaded to the BigFix Server under below path: \UploadManagerData\BufferDir\sha1

Action 3: This task performs a clean-up of the client settings.

1 Like

Do you have that second reboot policy you can share?

I understand that. So what I do is if it states pending restart on any of the machines in our environment via BigFix. I manually go into the machine and restart it. even with this restart. it still gets stuck on pending restart. Also, for some baselines i push out i have the post action restart set for 3 days and it should automatically restart. for about 250 machines its still pending a restart and its been like 2 weeks.

thats a different issue btw not the failing of window updates

I can but it is all related to how we do patching. We have “Patch Start” and “Patch End” parameters which are just times in local server time, and that’s the parameters we use in the relevances of the two reboot policies
Primary Reboot policy:
pending restart AND (if ((exists setting "Patch Start" whose (exist value of it) of it and exists setting "Patch End" whose (exist value of it) of it) of client) then ((value of setting "Patch Start" of it as time <= now and value of setting "Patch End" of it as time > now) of client) else false)

Secondary Reboot policy:
pending restart AND (if ((exists setting "Patch Start" whose (exist value of it) of it and exists setting "Patch End" whose (exist value of it) of it) of client) then ((value of setting "Patch Start" of it as time <= now and value of setting "Patch End" of it as time > now and (value of setting "Patch End" of it as time - 1 * hour) <= now) of client) else false)

So if you compare the relevances the secondary only becomes relevant in the last hour of the patch window (Patch End - 1 hour <= now) and that secondary job essentially acts as “clean-up” for those instances where the first reboot job doesn’t clear it.

Agreed. It does not sound like what you are facing is what I am describing.

Hello Iamabeginner,

We started getting the same issue October last year and are still working with microsoft. We started getting laptops failing to install the monthly CU and once a device fails to install that CU it will not install any following CU.
we proved it’s nothing to do with Bigfix.
We have had multiple tickets open with MS and have tried every step that has been suggested in the replies to your post and the only fix is to do an in place upgrade.However, we have found that the same laptops may be ok for a couple of months but can then fail again.
Through testing with MS we can see the component store gets corrupted, gets fixed by the various processes but then running the update breaks it again. Interestingly other updates such as .net work fine.
We have had over 250 physical devices and a couple of AVD’s affected.
We have complained to MS and they got a very senior member of MS in the uk involved who escalated it via our customer manager and only yesterday had it admitted by MS that this is a global issue with more and more customers around the world affected.
I strongly suggest you log a ticket with Microsoft and push it. What we found is that they work with you, fix the corrupted elements but then applying the CU then goes on to corrupt other elements so is a pointless exercise but its the only way you can push them to do something.
Apparently the product group are looking at a fix but no guarantee