Why would this pause code come back as failed occasionally?

I’m trying to ensure that nfs is up and ready before trying to mount an NFS filesystem on a Solaris box. I was having problems mounting an NFS filesystem after bringing a machine up in single user mode (with a reboot – -s) and then starting NFS. So, I created the following task to wait until the nfsmapid process was started, and then gave it another 2 minutes just to play it safe:

parameter "startTime_000"="{now}"

// Wait until the nfsmapid process starts, but don't wait for more than 8 minutes.
pause while { ( (now-time(parameter "startTime_000") < 480*second) AND not exist(process "nfsmapid") ) }

parameter "startTime_001"="{now}"

// Wait an additional 2 minutes just to be sure.
pause while { (now-time(parameter "startTime_001") < 120*second) }

Now, when I place this in a baseline between the reboot – -s task and the Critical Patch Update in which I use the NFS option, it sometimes comes back as Complete, and sometimes as Failed. Is there anything I can do to make sure this task shows as completed when it finishes, and why might this be showing up as failed at times.

Thanks,
BobK

Is this a fixlet or a task?

When it fails does it fail on a specific line? Does it have a different exit code than when it doesnt fail?

strawgate,

It was created as a task.

It failed on the first pause line

and the Exit Code was None

BobK

What does the client log say when it fails?

2 Likes

AlanM,

Here is where the task was called… It looks like it starts the Delay task after the task that brings the machine down to single-user mode is started, but before the system reboots. I was expecting this task to be started AFTER the system came up in single-user mode. After it mentions about the Not paused pause and Paused pause lines in the log, it shows that the Client was shutdown.

   Command succeeded parameter "startTime_000"="Fri, 25 Mar 2016 11:16:04 -0500"
 (group:83412,action:83415)
   Not paused pause while False (group:83412,action:83415)
   Command succeeded parameter "startTime_001"="Fri, 25 Mar 2016 11:16:04 -0500"
 (group:83412,action:83415)
   Paused pause while True (group:83412,action:83415)
At 11:16:12 -0500 -
   Client shutdown (Service manager stop request)

Current Date: March 25, 2016
   Client version 9.2.1.48 built for Solaris 10 Sparc
   Current Balance Settings: Use CPU: True Entitlement: 0 WorkIdle: 10 SleepIdle
: 480
   ICU data directory: '/var/opt/BESClient'
   ICU init status: SUCCESS
   ICU report character set: ISO_8859-1:1987
   ICU fxf character set: ISO_8859-1:1987
   ICU local character set: ISO_8859-1:1987
   ICU transcoding between fxf and local character sets: DISABLED
   ICU transcoding between report and local character sets: DISABLED
At 11:18:24 -0500 -
   Starting client version 9.2.1.48

Is there a way to ensure this task doesn’t start until after BESClient restarts?

Thanks,
BobK

CORRECTION:

Log file should look like this (The above actually showed Delay task Completed:

   Command succeeded parameter "startTime_000"="Fri, 25 Mar 2016 11:16:09 -0500"
 (group:83412,action:83415)
   Paused pause while True (group:83412,action:83415)
At 11:16:10 -0500 - mailboxsite (http://<ROOTSERVER>/cgi-
bin/bfgather.exe/mailboxsite10874007)
   Not Relevant - Solaris - OSERV (fixlet:83413)
   Not Relevant - Single-User Mode Task - Solaris (clean SMF) A (fixlet:83414)
At 11:16:12 -0500 -
   Report posted successfully
At 11:16:19 -0500 -
   Client shutdown (Service manager stop request)

It still runs the Delay script prior to shutdown, which is not what I expected…

Thanks,
BobK

Should I consider using svcadm stop BESClient at the end of the shutdown task to make sure the Delay task isn’t kicked off until I restart BESClient when the machine comes up in Single-User mode?

I know, I’m rambling…

Ramblin’ ramblin’ rambling’…

BobK

If an action is running while the client shuts down then it is always marked as failed when the client comes back up unless the action has a custom success criteria so this would be expected if the action is running

This really seems like a good use-case for the server automation functionality in Lifecycle.

If you can’t use server automation – could you have the shutdown task output {now} to a file or something and then in your delay task only run if the value in the value is earlier than the current computer up time?

  1. Output {now} to /tmp/bfxdelay
  2. Computer Reboots
  3. Delay fixlet compares now in /tmp/bfxdelay to current up time
    a. if time in /tmp/bfxdelay is newer than current up time we haven’t rebooted yet, go back to 3
    b. if time in /tmp/bfxdelay is older than current up time we have rebooted
  4. Run Delay Fixlet
1 Like

@strawgate

How would you implement step 3?

Thanks,
BobK

Something like:

Q: line 1 of file "C:\windows\temp\datetime.txt" as time
A: Wed, 30 Mar 2016 07:54:30 -0500

Q: now - uptime of operating system
A: Thu, 17 Mar 2016 14:42:26 -0500

Q: line 1 of file "C:\windows\temp\datetime.txt" as time < now - uptime of operating system
A: False

Is there any relevance you could use to determine if the system is in single user mode?

@strawgate

I thought that the fixlet that would check for time of reboot could fail due to the reboot, or could complete before the system has rebooted.

BobK

It will certainly be susceptible to random reboots but it won’t fail as a result of the reboot in your baseline because the next step won’t even be applicable to run until after the reboot has occurred (we are checking for the reboot in relevance not in actionscript)

Make sense?

@strawgate

OK…

If the next step isn’t applicable until after the reboot, what will prevent the baseline from continuing to the next component if the delay fixlet is showing as not being applicable?

Hopefully I’m not being thick…

Thanks,
BobK

Could you just make it into more than one baseline? The first baseline preps the machine and the second one is only applicable if

line 1 of file "C:\windows\temp\datetime.txt" as time < now - uptime of operating
1 Like

@strawgate,@AlanM,

What I have, at the moment, is a fixlet with the following:

Action Script:

// Wait until system has rebooted before continuing
pause while { line 1 of file "/var/tmp/reboot_delay.time" as time > now - uptime of operating system }

// After pause completes, clean up the temporary file that contains the date stamp
delete /var/tmp/reboot_delay.time

Relevance: (In addition to OS checking…)

(exists file "/var/tmp/reboot_delay.time") AND (line 1 of file "/var/tmp/reboot_delay.time" as time > now - uptime of operating system)

In the baseline, I have a component (task) that brings the Solaris OS down (and then up to single-user mode) with a reboot – -s and then start up BESClient and the NFS processes.

Next component is the above fixlet, which keeps any other components from running until the system if up in single-User mode. Then, when the BigFix client starts up, it continues with the next component, which checks to make sure that the NFS service has started. (pause looking for nfsmapid process, but wait no more than 10 minutes) When that completes, I have the Solaris Critical Patch Update fixlet for the current Quarter start, and using Action 3, which performs an NFS mount, and starts the install, using the CPU bundle which has already been unzipped on the NFS mount.

On the test machines I have tried so far, this seems to be working like a charm. If I start to see problems, I’ll go with splitting the baseline into 2 pieces, first doing the shutdown – -s and setting the file to the time, and the second, with relevance looking for that file, and checking the time, and when relevant, perform the 2nd baseline to perform patching and post-patching activities, including bringing the system back up in Multi-User mode.

Thanks for all your help,
BobK


1 Like