Action showing 'failed' on console, but apparently successful on clien

(imported topic written by menke123491)

I often have problems on my Solaris clients where the console shows an action status of ‘failed’, but the logs and relevance (via qna) locally on the client show apparent success.

Console - 7.1.1.315

Client OS - SunOS 5.9 & 5.10

Client BES ver - 6.0.16.46 & 7.1.1.315

Action - script type sh

  • success criteria - relevance evaluates to false (default)

After I kick off the action (though even before the status is reported to the console), I check __BESData/__Global/Logs/200901xx.log and see “Relevant - fixlet_name…”, then “Not Relevant - fixlet_name…”. I also increased logging verbosity on _BESClient_EMsg_Detail and see something similiar. I also run the same relevance through qna locally on the client and it reports false. Lastly, if it matters, the bash script has an explicit exit code 0 at the end and I’m verifying it’s exiting as such via local log file created by the script.

However, the console reports status “Failed” and the server still reports as relevant for the fixlet. (Only when I manually restart the BESClient daemon, the action changes to “Fixed” the the server no longer reports as relevant for the fixlet.)

Am I missing something? Is the client reporting back to the console too quickly or just inaccurately? Any easy fix?

I’ve only seen this behavior with Solaris clients and only with script type sh.

(imported comment written by BenKus)

Hey Menke,

Couple notes:

  • Does the “Relevant - fixlet_name…”, then “Not Relevant - fixlet_name…” contain the ID of the Fixlet? Or of the Action? If it is the action, then this just means that the action executed and you should see another “Not Relevant - fixlet_name…” with the Fixlet ID a little later.

  • When the action completes, the agent double checks the relevance of the Fixlet to see if it has gone false, if it has gone false, it is supposed to update the server immediately… However, if the action ends and the relevance isn’t false yet (like if you use “run” instead of “wait” the action might not be finished) then the agent won’t notice the Fixlet is not relevant until the next pass (which could be a few minutes or tens of minutes later depending on how much other stuff is going on with the agent)… Note that the agent should later mark the Fixlet as “Not Relevant” when gets around to checking it again…

Ben

(imported comment written by menke123491)

Following is the relevant part of the log. I interpret it to mean the fixlet relevance resolved to false (line #18) before the action (line #24). Seems counterintuitive?

Regarding your second point, I’m using action type “sh”, so there’s no “wait” or “run”. Can I assume it behaves like “wait” and will finish the script before it tries to evaluate the relevance again? Is there anything I can do to influence the order of events or timing? Set a sleep in the script?

Lastly, is the blank line (line #20) below a problem? I don’t see many (if any) blank lines on the same logs on Linux boxes.

At 14:40:37 -0500 -

GatherAction command received

GatherAction: Version difference, gathering action site

At 14:40:38 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Download ‘http://bigfixservername:52311/bfsites/actionsite_2873/__diffsite2872

At 14:40:38 -0500 -

Gather merging new file /var/opt/BESClient/__BESData/actionsite/Action 2836.fxf

At 14:40:38 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Successful Synchronization with FixSite (version 2873) - ‘http://bigfixservername:52311/cgi-bin/bfenterprise/BESGatherMirror.exe?url=http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite

At 14:40:39 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Relevant - BMC Patrol Agent Patch (fixlet:2836)

Relevant - Status of Action 2836 (fixlet:2147486484)

At 14:40:39 -0500 -

ActionLogMessage: (action 2836 ) starting action

At 14:40:40 -0500 -

Report posted successfully.

At 14:40:41 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Not Relevant - BMC Patrol Agent Patch (fixlet:2836)

At 14:40:51 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

At 14:41:06 -0500 -

Report posted successfully.

At 14:41:19 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Not Relevant - Status of Action 2836 (fixlet:2147486484)

(imported comment written by BenKus)

Hi menke,

I am not super-familiar with the sh action script type myself… As a test, you might want to create the shell script as file and then execute it using the BigFix Actionscript… for instance:

// create a file with shell script commands using the “appendfile” action, for instance find a pattern in the file, then output a file with the results
appendfile #!/bin/sh
appendfile grep somepattern filetodownload | wc -l > outputfile

// rename the script
move __appendfile myscript

// give permissions to run the script
run chmod 777 myscript

// run the script
wait myscript

This should at least give you some better logging in the BigFix logs.

Ben

(imported comment written by menke123491)

I made the recommended changes, but still seeing the same behavior - the action reports failed though the relevance evaluated locally (via qna) reports false. I don’t see anything with the new action type that indicates where the problem is occuring. Anything stand out to you?

Also, I noticed we’re never seeing a line with “Fixed - fixlet_name” (despite the relevance evaluating to false).

At 11:55:27 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Relevant - BMC Patrol Agent Patch (fixlet:2870)

Start monitoring action - Status of Action 2870 (fixlet:2147486518)

At 11:55:27 -0500 -

ActionLogMessage: (action 2870 ) action signature verified

ActionLogMessage: (action 2870 ) starting action

At 11:55:27 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Command succeeded createfile until ENDCREATE (fixlet 2870)

Command succeeded move __createfile patragentupdate.sh (fixlet 2870)

Command succeeded run chmod 744 patragentupdate.sh (fixlet 2870)

At 11:55:28 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Not Relevant - BMC Patrol Agent Patch (fixlet:2870)

At 11:55:35 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

End monitoring action - Status of Action 2870 (fixlet:2147486518)

At 11:55:41 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Command succeeded wait ./patragentupdate.sh (fixlet 2870)

At 11:55:42 -0500 -

ActionLogMessage: (action 2870 ) ending action

At 11:55:43 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Start monitoring action - Status of Action 2870 (fixlet:2147486518)

At 11:55:55 -0500 -

Report posted successfully.

At 11:56:07 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

End monitoring action - Status of Action 2870 (fixlet:2147486518)

(imported comment written by menke123491)

I decided to play around with the relevance - not the action - to see if that impacted the success/failure. It looks like it does. I ran through 3 or 4 of each and the results were consistent each time. Following is (a) the original relevance and resultant log

failure

and (b) alternate relevance and resultant log

success

.

The original relevance checks for (a) certain OS, (b) presence of the “SAMPatrol” package, © absence of the “SAMPAgtBF” package or package less than a certain version. Note that relevance reports false (via qna) within seconds of the fixlet running.

The alternate relevance just checks for (a) certain OS and (b) absence of a log file created by the fixlet script. The action is the same. Note the log now reports “Fixed”.

Is anything wrong with the original relevance? Does the BESClient check “pkgdb” before “pkgdb” registers the package has actually been installed?

original relevance - failure

if (name of operating system contains “SunOS”) then ((exists pkginfo “SAMPatrol” of pkgdb) and (not exists ((pkginfo “SAMPAgtBF” of pkgdb) whose (version of it >= “2.0”)))) else if (name of operating system contains “Linux”) then ((exists package “SAMPatrol” of rpm) and (not exists ((package “SAMPAgtBF” of rpm) whose (version of it >= “2.2-2.2”)))) else (false)

At 15:44:20 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Relevant - BMC Patrol Agent Patch-mje (fixlet:2883)

Start monitoring action - Status of Action 2883 (fixlet:2147486531)

At 15:44:20 -0500 -

ActionLogMessage: (action 2883 ) action signature verified

ActionLogMessage: (action 2883 ) starting action

At 15:44:21 -0500 -

Report posted successfully.

At 15:44:23 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Not Relevant - BMC Patrol Agent Patch-mje (fixlet:2883)

At 15:44:32 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

End monitoring action - Status of Action 2883 (fixlet:2147486531)

At 15:44:34 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

At 15:44:34 -0500 -

ActionLogMessage: (action 2883 ) ending action

At 15:44:38 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Start monitoring action - Status of Action 2883 (fixlet:2147486531)

At 15:44:49 -0500 -

Report posted successfully.

At 15:45:01 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

End monitoring action - Status of Action 2883 (fixlet:2147486531)

bash-3.00# date ; /opt/BESClient/bin/qna.sh /opt/BESClient/bin/qnainput.txt

Tue Jan 20 15:44:39 EST 2009

BESClientConfigPath must be set

BESClientActionMastheadPath not set, using /etc/opt/BESClient/actionsite.afxm

Q: if (name of operating system contains “SunOS”) then ((exists pkginfo “SAMPatrol” of pkgdb) and (not exists ((pkginfo “SAMPAgtBF” of pkgdb) whose (version of it >= “2.0”)))) else if (name of operating system contains “Linux”) then ((exists package “SAMPatrol” of rpm) and (not exists ((package “SAMPAgtBF” of rpm) whose (version of it >= “2.2-2.2”)))) else (false)

A: False

================================

alternate relevance - success

if name of operating system contains “SunOS” then (not exists file “/tmp/SAMPAgtBF-2.bf.debug”) else (false)

At 15:37:40 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Relevant - BMC Patrol Agent Patch-mje (fixlet:2882)

Start monitoring action - Status of Action 2882 (fixlet:2147486530)

At 15:37:40 -0500 -

ActionLogMessage: (action 2882 ) action signature verified

ActionLogMessage: (action 2882 ) starting action

At 15:37:40 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Fixed - BMC Patrol Agent Patch-mje (fixlet:2872)

At 15:37:41 -0500 -

Report posted successfully.

At 15:37:43 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

Not Relevant - BMC Patrol Agent Patch-mje (fixlet:2882)

At 15:37:53 -0500 - actionsite (http://bigfixservername:52311/cgi-bin/bfgather.exe/actionsite)

At 15:37:53 -0500 -

ActionLogMessage: (action 2882 ) ending action

At 15:38:09 -0500 -

Report posted successfully.

(imported comment written by BenKus)

Hey Menke,

That sounds very plausible… If the pkgdb has some sort of cache updating scheme, then this could cause the issue you are looking at… Perhaps add a 60 second wait at the end of your action?

Ben

(imported comment written by menke123491)

Unfortunately, that wasn’t it. I added a 60 second sleep at the end of the script, but had the same problem. I also ran qna with an input file at different times in the script, i.e. before the actual package install, just after the package install, and after the 60 second sleep. The relevance changes from true to false immediately after the package install. I also changed around the logic in the relevance just to see (using an if…then instead of an and), but that didn’t make a difference either. I guess I’ll need to open a support ticket. Thanks.