Staggering Actions

FatScottishGuy · June 16, 2022, 10:42pm

It’s like I always say about this forum, every day is a learning day but I want to check that what I’ve learned is correct and that what I originally perceived to be correct is indeed incorrect.

When staggering an action (as an example) over 15 minutes and targeting 10,000 endpoints in an action set to last for 1 hour, how exactly is the staggering done?

My perception of this, based on documentation and the English language, is that it would take the 1 hour period, split that across the 15 minute intervals and essentially run the action on 2,500 endpoints every 15 minutes.

After reading a reply by @trn it seems I was way off with this!

Does it, in fact, apply a random wait, between 0 and 15 minutes, on the client after the fixlet becomes relevant and then run?

If this is the case then technically speaking a full job could theoretically (and I’m being super extreme here) be done in under 15 minutes if there was no failures and UDP was enabled everywhere on a super fast network.

Scenario 2:

If I done the same job but staggered the actions over 1 hour then even though the job itself has ended, there is the possibility (in extreme opposite circumstances of above) that the action can run anywhere up to 1 hour after the action has ended.

Hopefully someone can confirm this and maybe update documentation to be way more precise.

JasonWalker · June 17, 2022, 2:29am

Scenario 1 is correct - each client waits a random amount of time between 0 seconds and 15 minutes before starting the job. You’ll see this in the client log as “temporal distribution”. No client knows how long any other client is waiting.

In scenario 2, if the stagger time is too long and a client doesn’t begin execution before the action expiration time, that action may not execute at all.

If the action triggers an external process through ‘wait’ or ‘run’, that external task will not be stopped and it might continue executing after the action expires.

If sending a Baseline / Multiple Action Group, when the group crosses the expiration time the next component action will not be executed.

One common pitfall is that when staggering a Group Action, each component of the group gets its own separate random stagger. With 60 components and a one-minute stagger, there could be as much as 60 minutes of stagger across the group action.

Where I’m not positive, and maybe @AlanM can weigh in, is if the action is in the middle of execution when the action expires, whether that individual action will run to completion or whether it will be stopped in the middle. An action that is Stopped in the Console will be abandoned on the client, I’m just not sure about an action that was still running at the expiration time.

FatScottishGuy · June 17, 2022, 7:57am

Does my extreme case theory of being complete in 15 minutes stand with this in mind?

mesee2 · June 17, 2022, 8:03am

To clarify the “Execution tab” “Stagger action start times over” - is not just staggering the start of the baseline, but actually the sub actions within the baseline once it starts?

You mention Group Action… you mean the delays may be introduced within the baseline execution at the “Component Group” boundaries within the Baseline action?

This does not appear to be what i see in the logs during patching.

What i read once and do see more than not is setting “stagger action start times over” to 10 mins, it will usually cause the job to start between 1-20 mins (aka double the defined time)

trn · June 17, 2022, 8:33am

I also see that actions taken on baselines just have the one “temporal distribution” entry in the client logs - perhaps a difference between actions taken on a baseline and group actions taken by picking an ad-hoc selection of fixlets?

There was the ‘feature’ of the delay being doubled on actions (certainly on baseline actions) a few versions ago. I’m not actually sure whether this has been fixed - I’ll have to do some testing so I know what the current situation is.