So, I got my butt handed to me by my boss, and now I have more BigFix questions

BCannon · December 15, 2017, 5:07pm

This may be a rant, but I do have legitimate questions.

A little background. I’ve been trying to play catch up with patches to bring our environment current. I made the mistake of deploying some new content and then going back to do patches sequentially. Last night’s baseline broke some of the execs machines. Long story short, boss comes in to rip me a new one, and raises the question . . . how much can we trust the results in BigFix to the applicability of patches? Using this tool or any tool for that matter for patching, we put our trust in that application manufacturer or process to update and fix tools we use everyday. I’ve always heard to verify patches in a test environment before deploying. In a perfect world I’d do this, but I’ve just been going for it in our production environment. I know other BigFixers who are in a similar boat, build a baseline and let it go. But how can we know that what we’re doing won’t break?

Similarly, how can Windows update say there are no applicable updates and BigFix says there are?

I have tried using web reports to show the progress being made but the information there seems inconsistent. I can pull the report and the data seems legit, my VP or others pull same report with same criteria and their numbers are different. That builds less confidence in the product.

Sorry for the long post. Needless to say, my personal confidence is low for being ripped a new one, but my confidence in BigFix has waned a bit as well.

Thanks,
Brian

TimRice · December 15, 2017, 6:27pm

Welcome to the Career of Computer Support!

I feel your pain, but it’s not caused by BigFix per se. It sounds like the systems you need to manage have never really been properly managed before. If they are missing lots of patches, it can take a while to get everything up to where it should be.

First thing to understand about BigFix … the content it provides is NOT going to match Windows Update exactly for a couple reasons.

BigFix provides more than just Microsoft content. There is content from Adobe, Sun, etc. Even the Patches for Windows site will not match WU exactly because IBM occasionally includes content requested by customers. Things like Hotfixes. Because of this, you will see content in BigFix that you will not see in Windows Update.
Windows Update (WU) provides more than just Security Patches. It includes some stability patches as well. IBM really only commits to providing the Security Patches on a timely basis. Because of this, you will sometimes see content in WU that is not in BigFix.

You don’t indicate which OS’s you are supporting.

IBM relies on Vendor documentation (i.e. Microsoft) when creating the Fixlets that deliver the patches to your computers. Very (very, very when you consider how many Fixlets they provide) rarely IBM gets the Relevance wrong, but there are enough of us using the product that they get notified and fix the issues promptly.

Sometimes, Microsoft makes a mistake with a patch, or with the documentation for the patch. When they publish corrections, IBM usually catches it and corrects their Fixlets within 12/24 hours.

In the end, BigFix is a tool. If you can’t implement processes to protect the computers you have to manage, I don’t care if you use BigFix, WSUS, or SCCM, things will get broken occasionally.

I’ve been supporting Windows computers since the days of DOS and Windows 3.1. The biggest lesson I’ve learned that applies here is that automation tools without appropriate policy and procedure just turn into faster ways to break systems.

You need to talk with your Management and help them to understand this, and to let you implement some testing procedures. Even if all it amounts to are a handful of computers you can push patches to BEFORE you push them more globally. That way you can install some of the more common or critical apps on them and have a better feel for what might go wrong, before you break a critical (or Senior) system.

Personally, I have a very high level of trust in what BigFix reports to me. I currently use it to patch and manage Windows, Macintosh, Red Hat, CentOS, AIX, and Solaris systems.

In every case when someone has tried to blame BigFix for “breaking” something, or providing bad information, I’ve been able to track it back to a Person doing something they shouldn’t have, or something odd on a specific computer (misconfiguration, glitch, etc).

In the end, BigFix is really just a highly configurable remote scripting tool, with a high level of visibility into endpoints and a fairly effective file delivery system.

I can use a hammer to build a house, or tear it down. It’s all how the tool is applied.

BTW: I’m NOT an IBM employee. I’ve just used BigFix since 2003, and have never found a better tool for my environment of +46k endpoints.

BCannon · December 15, 2017, 7:42pm

Hi Tim, and thanks.

or starters, the majority of our endpoints are Windows. Add on top of that most of those are virtualized. One thing I do know, is the patch management wasn’t managed well before I came on board.

Myself, our desktop guys, and server guys have all talked and know to expect things to break. Especially playing catch up. The baseline I ran last night was for 2014, add to that some more current updates have been applied those those machines, things are going to break. I know that, they know that. Sometimes I think the boss forgets that. Plus when an owner/exec machine blows up, tension levels escalate.

I think everything you’ve mention makes sense and it how I’ve approached BigFix this entire time I’ve used it. The difficult part I’m running into is how to adequately express and convince those above me all of these things. I mean, these are three year old updates, I’m pretty certain they’re verified.

Obviously, no one likes to get chewed out. My boss had a very rough week, and BigFix is a sensitive subject anyway, because they don’t understand it or have confidence in it, and in me because I’m the one running it.

TheTick · December 15, 2017, 8:13pm

Just to add to what Tim mentioned.

I know that in the past I have had the question “why does WSUS/WU report this and BigFix reports something different?” This can be tedious to do as really the only way to prove it is to sit down and look at each system (or a handful) and check it out. You can piece together the relevance to show what is needed (although this is getting harder with the rollup patches) and also compare the binaries that the KB article defines, then take that to the system and see. This is also something that IBMers do when they POC, because that is the question they get especially from the people who managed the patching environment previously. I personally find this the hardest part about BigFix, proving that it is right

I just got hammered because BigFix rebooted a domain controller (whole other story) and the server did not come up right and now some people are blaming BigFix. It could not possibly be that the server had about 15 different patches installed over the past 4 months and it has not been rebooted. Oh well, that is what we deal with sometimes.

Martin

BCannon · December 15, 2017, 9:07pm

Good points, Martin. Thank you.

After discussing a few things with our server and desktop guys, I think I have a new plan of attack. For example, up to now when I set up a baseline, I’ve been going off of release date regardless of severity. I think that is part of my problem. Shotgunning everything. So, I’m going to redo my baselines based on severity then release date going forward. I think that will provide a better idea of what is necessary from a criticality standpoint.

Just out of curiosity, what is the approach you guys take when setting up baselines? I’m still learning this, but feel decently confident in what I’m doing. Explaining it to others is another matter.

TimRice · December 15, 2017, 9:29pm

@BCannon,
A piece of advice.

Limit the number of Fixlets you put into your baselines. The BES Client get’s “tunnel vision” while it is evaluating the contents in a Baseline. It won’t do anything else until it finishes with the evaluation cycle for the baseline. I’ve heard recommendations that range from 100-250 MAX Fixlets per baseline.

I had one Console Operator who just kept appending the Monthly Microsoft patches to a single baseline. It got so big I couldn’t edit the thing to clean it up, I had to simply delete it and start from scratch with new baselines for his group.

The way I personally handle Monthly Windows patching Baselines is with Two Baselines each month.

Critical/Important
Medium/Low/Other

I then name the baselines something like

2017-12 : Windows Critical/Important Patches
2017-12 : Windows Medium/Low/Other Patches

When I add patches to the Baseline, I make sure I pick them from the “All Fixlet Messages” list, not the “All Relevant Fixlet Messages” list. This way if any new computers show up needing a different fixlet, the Baseline covers them without me needing to update it.

After you’ve used BigFix for a while, the Baselines will begin to pile up. Be sure to move them and their Actions into a Custom site that doesn’t have any computers subscribed to it all the time. This will reduce the workload on the BES Clients and keep things responsive. You can then temporarily join new computers to the custom site so they evaluate the older baselines, then once they are caught up, unsubscribe them from the site.

BCannon · December 15, 2017, 9:49pm

Awesome. Other than the naming convention, that’s how I’m trying to set these up as well. I think going forward this is cause a lot less stress. Needless to say it’s been a rough day, but I think learning from this is nothing but a good thing.

Thanks for everyone’s input!

BCannon · December 15, 2017, 9:52pm

Just out of curiosity on the fixlet limit per baseline, is that total fixlets per baseline or applicable?

TheTick · December 15, 2017, 9:58pm

The worst part of starting up with BigFix is the visbility you now have. I mean the worst part because you really see the state of your environment and have a “WTH?” moment. Then you start figuring out how to catch up. Typically, I find trying to catch the entire environment up at one time is really bad because you could spend hours on each machine trying to patch (well hours for the baseline to apply) and you could end up with hundreds of patches to be applied.

A couple methods to maybe try:

try to start with the critical security patches. You could just do this by year for each baseline, but it could be too many (as stated above), so you might have to break into smaller baselines.
Find the critical security patches with high numbers of relevant systems and start with those.

Once you are caught up, life becomes much easier. I would also work on a strategy with your server/workstation build teams to ensure your baselines cover patches needed for the builds. If the build was last updated in say 2015, then you need baselines going all the way back to 2015. If they update every quarter, then you can cut down on the baselines.

Just some thoughts.

TimRice · December 16, 2017, 1:30am

The limit is the number of Fixlets you add to the Baseline.

jgstew · December 18, 2017, 11:40pm

Probably a good idea to write up some best practices for patching based upon all of this.

If you are very behind on patching, I would recommend taking care of all critical / important patches regardless of release date first before other patches. Also, I would not apply hotfixes or anything that doesn’t have a severity rating at all unless you know you need them. Also you should be more careful about any patch that doesn’t have a default action, which can be a signal that it isn’t a typical patch that should be widely deployed.

It is also a good idea to send out the patches in smaller batches for catch up reasons so that if something does go wrong, you are more likely to figure out what happened, so you might set a goal of catching up on X patches per week until all caught up.

Patches should be rolled out on different schedules across the fleet so that you can catch problems early. You should always test with a smaller sampling of systems first to make sure there are not issues before rolling it out everywhere.

It is helpful to tag certain machines as always getting updates last so they are not negatively affected. This is generally going to be VIPs / Execs, and External / Customer facing services / servers.

Another thing you can do is tag some systems as always getting updates first. These could be a variety of test systems that you and others in IT have, as well as some others you designate.

Then for all the systems in between you could use some randomized distribution to spread out the way they are patched between the first set and the last.

All of the above can be automated with some relevance that can be added to each baseline so that you can create just a single baseline and a single action, but the relevance causes it to roll out using the above criteria automatically. I have examples of this on BigFix.Me

Another option is to use the WebUI Autopatch functionality to select patches based upon criteria, then create different schedules for those patch sets based upon computer groups. This is a set it and forget it approach which is quite handy.

jgstew · December 18, 2017, 11:51pm

It also isn’t a bad idea to do some investigation into some of your systems in general, and you can do that with bigfix. As with everything, test actions on a few systems first.

You can run sfc /scannow to validate window’s files: https://bigfix.me/fixlet/details/6038 (this is old, may require wow redireciton turned off)

Get the results: https://bigfix.me/analysis/details/2994541

Check for all windows updates and output results to a file: https://www.bigfix.me/fixlet/details/23123

Get the results from windows update check above: https://bigfix.me/analysis/details/2998507

Get info about current windows update setting in general: https://bigfix.me/analysis/details/2998397

Run windows update troubleshooter: https://www.bigfix.me/fixlet/details/21363

steve · December 19, 2017, 6:22pm

I would also add, be careful about mixing current cumulative patches with older individual patches (e.g. 2017 cumulative with 2015 bulletins) in the same baseline or install/reboot cycle. In my experience, Microsoft is not testing/building interoperability between these different types of patches, and you could experience problems with blue screens, etc. So either order your patches carefully, or just deploy different date ranges separately.

Once you get caught up, this shouldn’t be an issue, but right now it will be.

jgstew · December 19, 2017, 8:38pm

I have also been hearing this feedback with issues happening due to the mix of patches microsoft is releasing. Traditionally it has been pretty safe to throw everything at a system since BigFix invokes the patches using the windows update mechanism, so generally you would expect it to reject something if there is somehow a problematic interaction at that level, but specifically with this new mix of cumulative updates that hasn’t always remained true.

When getting caught up on patching, it is probably best to start with cumulative patches before doing others.

JasonWalker · December 20, 2017, 2:15am

I also prefer to start with the cumulative rollups, as that will make a lot of the older updates non-relevant and I can skip them entirely. I have a lingering fear this may burn me some time, but it hasn’t happened yet.

anon87818915 · December 20, 2017, 1:44pm

Not to mention that some of these recent rollups make changes that require a reboot to fully implement. Its possible that a rollup applied a change, but a subsequent action’s relevance didn’t fully pick up on it (MS has stated they are not fully back-testing legacy patches) and that action replaces something that may be a requirement for something else in the rollup.

Normally this isn’t an issue, but if years behind in patching it is now a fully plausable situation you might find yourself in.

Like others have stated, a good strategy to follow is:

Identify a sample set of systems for acceptance testing, and if one of those systems is primarily used by an executive or serves a mission-critical need, find another system.
Apply the most-recent (and approved for organization deployment, if required) security rollup patch MS has released to the sample machine set, then ensure a reboot. You could also include any other MS patches that were released around the same time (since these would have a better likelyhood of being part of MS’s testing).
Deploy the remaining relevant patches you wish, preferably in smaller chunks, ensuring you look to see if there are any notes added that might indicate known issues.
Ensure the machines undergo at least 1 reboot following the above so any problems that may only be encountered following a reboot are found. Try using your mission critical applications and services to confirm these continue to function as expected. If things look good, proceed to a larger deployment (or pilot deployment especially if you have a large enterprise), again applying the most-recent rollup patch first.

Now that you have your enterprise patching current, you should be able to adjust to a more fluid and/or automatic strategy to keep it this way. Just always remember to test and verify everything prior to full rollout. Fixing a handful of non-critical machines manually is a pain, but manageable. Fixing your entire enterprise when it’s down and costing thousands/millions in losses every hour… better start practicing “Would you like fries with that?”

GregD · December 20, 2017, 7:43pm

All of the above is good advice…

“Stagger your releases” is probably the most important thing. Have a pilot group of a chosen few people with sensitive apps, or who are vocal, but don’t shotgun the entire department… have a tier 1, tier 2, tier 3, do the execs dead last… Force a reboot post patching, as you know they don’t apply until the PC restarts, if not right after patching then on the week end… Wait a week after your pilot, make sure uptime looks like they rebooted, and make sure you hear back from those users that all looks well… In writing (email) is even better ! Make them part of the process…

When faced with a similar situation (SCCM filter had missed a lot of patches over 5yrs before I was handed patching with BigFix), I was nervous about patching PCs with 4y old stuff… I made sure I was caught up on service packs and that month’s quality upgrades first (you never know, they might remove older patches from relevance and reduce the number you need to catch up). Then I did small baselines of critical/important by year, with reboots on small groups, then big groups, then a week later another batch… etc… Apparently pushing super old relevant patches out of sequence does not break much !

As far as people coming to complain to you first, that’s the burden of a BigFix admin… “what are you pushing today?” - more like “What am I not pushing today?”

jgstew · December 20, 2017, 7:50pm

I would also add that while rare, it is possible for machines that have always been fully patched to break in some way when MS releases a new patch. I feel like this is actually happening slightly more often than it used to, while still being rather rare. I’ve had this happen where machines were configured to also get updates from Windows Update, and it was actually through Windows Update that they ended up breaking, not through BigFix where a more careful rollout was in place. In this case it was BigFix to the rescue to block the problematic updates from being delivered through Windows Update on any machines that didn’t have it yet, as well as rolling back the machines that already did.

The issue you run into when doing a catch up of patching is that you have the above situation compounded by every patch you are missing, plus extra oddities of not being current and patches potentially overlapping.

In a way, BigFix is a bit too good at rolling out tons of patches very quickly to all of your machines when they are all behind.

BCannon · December 21, 2017, 5:56pm

Wow, thanks everyone. I haven’t checked back in here for a couple days. All of this is great advice, and I think I’m in much better shape going forward because of it. Now I have some serious reading to do to catch up to all of the additional comments. Thanks again everyone!

BCannon · December 21, 2017, 6:53pm

Now that I’ve caught up with reading everyone’s comments, I thought I’d give a small update and lessons learned from this.

The biggest fatal mistake I realized I made was how I set up my baselines. I’m sorry to admit I basically set them up backwards. Rather than going Severity and then release date, I basically set them up strictly by release date. I’m grimacing writing that as much as I would be if I were reading it. But, lesson learned. I hate that I broke a few machines but everyone seems to have settled down and we’re all friends again. I went back removed all my old baselines and then recreated them only using Critical and Important updates. On the older releases I created year baselines, and the closer I got to current I divided them by quarters, then months.

Shockingly (sarcasm) the number of applicable machines reduced significantly!

Having another meeting with my bosses and server folks, led to more questions of me as to why certain reports were showing machines needing old baselines. What I suspected and proved correct was that redoing all of the baselines led to more correct reporting of applicable updates. Meaning, baselines with low, unspecified etc, fixlets were showing applicable, I run baselines and boom, things break.

Anyway, I think if I keep typing I’ll just ramble on and no one wants that.

Thanks again for all of the tips and advice, I truly appreciate it.