Strange issue with action script

All, I’m having weird issue with action script that I would like to see if somebody has any insights/ideas.
Background: Working on project that will handle some Agent Healing stuff, by checking agent is running and if not try to remediate, perform various checks on the agent see if it’s running, reporting to its backend etc. I’ve started with individual action script for each agent (6 in total), and they would have to run every 15’ and report info to some BigFix Settings/Properties that info would be imported in some PowerBI dashboard for reporting purpose. This is running in our environment (140,000 devices) across different OS’s such as Windows / MAC / Linux (various flavours).

We have been running this for several months and now to simplify this I was planning to merge the 6 individual actions script into 1 script (Taking out comments and blank lines total script is about 300 lines) and that would run every 15’. When I target this to initial test group (1000 devices) all was good and working fine, now I targeted this to my whole environment and strangely enough I got around 1,200 devices reporting the action as “”, I’ve captured some logs and getting message about “Relevance Substitution Error”!!! If I retarget that same device with the same script, it works fine !!!, also checking same hosts when it was targeted with individual scripts it was also working fine.

the idea would be to only maintain 1 script (including plenty of comments to also simplify any updates required to specific sections), the errors are happening across different OS’s not specific to 1 OS, and the other issue is that because script returns it’s not retrying so net result I’m not getting info from those agents, for time being I’ve re-issued the initial scripts for each agent individually but was hoping if somebody has any ideas suggestions to resolve this problem

I’m running 10.0.7.x environment, OS flavours WIndows 10; Windows Server 2012/2016/2019; Linux (RedHat; Ubuntu; CentOS; Suse)

Thx.

It’s hard to say where the errors lie without seeing anything however that said, remember that when you combine multiple fixlets you are also combining the relevance that goes with them so often you will see that error when it’s trying to evaluate parts of the fixlet that it hasn’t even got to yet as the relevance runs for the whole fixlet before the action runs.

One of the ways I found to combat this was using prefetch blocks (if you are using downloads in the fixlet) - that way it can bypass a lot of the relevance checks.

Want to add a bit more context on this, and thx for those that have already replied.
The action is targeting multiple OS’s and the can happen even on end points where the action has run successful multiple times (in one instance is has run successful 40 times and then suddenly it returns error)

I’ve updated the action script to ensure when certain action/command needs to run those certain commands exists on the end point, same is true for populating parameter/settings using text file, validate first text file exists and has content in it, I’m using also the “|” option in case of failure, this has improved already the results but still it’s not yet completely bullet proof.

what is frustrating is that when the action returns “error” it’s not being retried which is different behaviour to action that failed. The retry mechanism is set to 3x with 15’ interval.
the action is rerun every hour on the device this for getting on-going info about the specific agents installed on the device.

with regards to comment about the relevance, the action script needs to run on every end point, I’ve got logic inside the action script to determine if certain part of the script needs to run or not, by using if-then-else clauses.

ex. if {(exists setting “_PG_AgentInstalled_Test” whose (value of it as string as lowercase = “y”) of client)}

the error that’s returned “Invalid action content: the action script contains a syntax error.”
The same script runs fine on other machines with same config and OS, and also could have run fine on that same endpoint for multiple times before erroring out.

I’ve managed to find some issues by enabling “debug” logging and only deploy certain parts of the scripts, but as the errors are almost occurring randomly it’s very hard to determine where to enable debugging and for sure don’t’ want to enable this broadly

currently less then 1% of devices experiencing this issue, which in our environment still translates to about 1,000 devices globally which is a lot.

Will continue investigating and improving but any ideas/suggestions especially around error handling are much appreciated.

Thx.

Without seeing the actual script it’s hard to guess what .ay be happening.

Are those “smart Quotes”?

if {(exists setting “_PG_AgentInstalled_Test” whose (value of it as string as lowercase = “y”) of client)}

insure normal quotes.

if {(exists setting "_PG_AgentInstalled_Test" whose (value of it as string as lowercase = "y") of client)}

Assuming you already got that handled, you can also improve your troubleshooting by breaking things into smaller pieces and using Parameters to test things along the way (so they show up in the client logs)

Parameter "1" = "{exists client}"
Parameter "2" = "{exists setting "_PG_AgentInstalled_Test" of client}"
Parameter "3" = "{exists value of setting "_PG_AgentInstalled_Test" of client}"
Parameter "4" = "{value of setting "_PG_AgentInstalled_Test" of client}"
Parameter "5" = "{value of setting "_PG_AgentInstalled_Test" of client as lowercase}"
Parameter "6" = "{(exists setting "_PG_AgentInstalled_Test" whose (value of it as string as lowercase = "y") of client)}"
if {(exists setting "_PG_AgentInstalled_Test" whose (value of it as string as lowercase = "y") of client)}
parameter "madeItInsideOfTheIFLoop" = "true" 
endif

Another thought - make sure that none of your Parameter statements are duplicated in your converged script.
Once you set a Parameter, you cannot set it a second time in the same action script…

1 Like