Wait Command Hanging when trying to kill Parent process

Hello,

I’m trying to create a fixlet which will look for zombie process and kill it by ending it’s parent process. the issue I’m facing is mentioned below:

The fixlet is running absolutely fine, doing all the intended work. but the issue is, it is getting stuck on the wait command and is not moving forward. I left it for 2 days and the machine stopped reporting due to this.

I’m pasting the Code below, any help would be greatly appreciated.

//CODE STARTS
action parameter query "ErrorFolder" with description "Please enter the name of folder and path" with default value ""
if { not exists folder (parameter "ErrorFolder")}
folder create {parameter "ErrorFolder"}
endif
delete	__createfile
createfile until EOF
#!/bin/bash



log_message() {
    echo "$(date +"%Y-%m-%d %H:%M:%S") - INFO - $1" >> {parameter "ErrorFolder"}/output.txt
}

log_error() {
    echo "$(date +"%Y-%m-%d %H:%M:%S") - ERROR - $1" >> {parameter "ErrorFolder"}/error.txt
}

# Get all zombie process IDs
ZOMBIES=($(ps -e -o pid,stat | grep 'Z' | awk '{{print $1}'))

if [ ${{#ZOMBIES[@]} -gt 0 ]; then
    log_message "Zombie processes found: ${{ZOMBIES[*]}"
    echo "" >> $LOG_FILE

    for PID in "${{ZOMBIES[@]}"; do
        # Find the parent process ID (PPID)
        PARENT_PID=$(ps -p $PID -o ppid= | tr -d ' ')

        if [ -n "$PARENT_PID" ]; then
            log_message "Attempting to notify parent process $PARENT_PID to clean up zombie $PID"
            echo "" >> $LOG_FILE

            # Send SIGCHLD signal to the parent process to notify it of child status change
            #kill -SIGCHLD $PARENT_PID > /dev/null 2>&1 || true
			#pkill -9 -P $PARENT_PID > /dev/null 2>&1


            # Sleep for 4 seconds to allow the parent process to handle the zombie
            #sleep 4

            # Check if the zombie process still exists
            if ps -p $PID > /dev/null 2>&1; then
                log_message "Zombie process $PID not cleaned up by parent process $PARENT_PID, forcefully terminating it"
                echo "" >> $LOG_FILE

                # Forcefully terminate the zombie process
                kill -9 $PARENT_PID > /dev/null 2>&1 || true
				#pkill -9 -P $PARENT_PID > /dev/null 2>&1
                if [ $? -ne 0 ]; then
                    log_error "Failed to forcefully terminate zombie process $PID."
                else
                    log_message "Successfully forcefully terminated zombie process $PID."
                    echo "" >> $LOG_FILE
                fi
            else
                log_message "Zombie process $PID cleaned up by parent process $PARENT_PID."
                echo "" >> $LOG_FILE
            fi
        else
            log_error "Unable to find parent process ID for zombie $PID."
            echo "" >> $LOG_FILE
        fi
    done

    echo "" >> $LOG_FILE
    log_message "Going into sleep state for 60 seconds before rechecking for zombie processes..."
    #sleep 60

    # Check again for any remaining zombie processes
    ZOMBIES=($(ps -e -o pid,stat | grep 'Z' | awk '{{print $1}'))
    if [ ${{#ZOMBIES[@]} -gt 0 ]; then
        log_message "Additional zombie processes found after 60 seconds: ${{ZOMBIES[*]}"
        echo "" >> $LOG_FILE
        log_message "Attempting to clean them up..."
        echo "" >> $LOG_FILE

        for PID in "${{ZOMBIES[@]}"; do
            PARENT_PID=$(ps -p $PID -o ppid= | tr -d ' ')
            if [ -n "$PARENT_PID" ]; then
                log_message "Attempting to notify parent process $PARENT_PID to clean up zombie $PID"
                echo "" >> $LOG_FILE

                # Send SIGCHLD signal to the parent process
                #kill $PARENT_PID > /dev/null 2>&1 || true

                # Sleep for 4 seconds to allow the parent process to handle the zombie
                #sleep 4

                # Check if the zombie process still exists
                if ps -p $PID > /dev/null 2>&1; then
                    log_message "Zombie process $PID not cleaned up by parent process $PARENT_PID, forcefully terminating it"
                    echo "" >> $LOG_FILE

                    # Forcefully terminate the zombie process
                    kill -9 $PARENT_PID > /dev/null 2>&1 || true

                    if [ $? -ne 0 ]; then
                        log_error "Failed to forcefully terminate zombie process $PID."
                        echo "" >> $LOG_FILE
                    else
                        log_message "Successfully forcefully terminated zombie process $PID."
                        echo "" >> $LOG_FILE
                    fi
                else
                    log_message "Zombie process $PID cleaned up by parent process $PARENT_PID."
                    echo "" >> $LOG_FILE
                fi
            else
                log_error "Unable to find parent process ID for zombie $PID."
                echo "" >> $LOG_FILE
            fi
        done
    else
        log_message "No additional zombie processes found after 60 seconds."
        echo "" >> $LOG_FILE
    fi
else
    log_message "No zombie processes found."
    echo "" >> $LOG_FILE
fi
EOF
delete "{parameter "ErrorFolder"}/cpu.sh"

move __createfile "{parameter "ErrorFolder"}/cpu.sh"
run chmod 775 "{parameter "ErrorFolder"}/cpu.sh"
override wait
wait /bin/sh "{parameter "ErrorFolder"}/cpu.sh"



if {exists file "error1.txt" of folder (parameter "ErrorFolder")}
Exit 100
endif

//CODE ENDS

I have also tried using run command to just run the fixlet but the thing is I need exit code 0 with Fixed status, which I’m not getting using Run command. I have also tried using just Kill command instead of kill -9 but it is shutting down my client. any help would be appreciated

The problem lies within your script. You should examine the logs generated by your script and implement proper error handling. If conditions don’t match, ensure the script exit gracefully. BigFix is executing the commands as requested; however, your script might be encountering issues, causing it to hang and potentially leaving the BESClient process running indefinitely.

Okay the thing is, the script is working fine manually on the endpoint. it is exiting without any errors but with bigfix it is getting stuck on the wait command. I have debugged the problem and till now I have seen that as soon as I’m introducing looping in my bash script, it is getting stuck on the wait command. eventhough it is killing the process and also making the error and output file.

is looping not supported on BigFix?

Yes, BigFix can achieve that functionality. However, as I mentioned earlier, there seems to be an issue with your script. When I tested it on my two test boxes, RHEL and CentOS, both encountered endless execution loops. After manually running your script, I identified the root cause of the problem. While I’m not a Linux expert, I recommend thoroughly validating and correcting your script under appropriate guidance.

image

image

Update:

This fixlet is stopping the Besclient process as well. I have no idea why, the zombie Array is only storing the PID of zombie process and the parent ID filter is also accurately just fetching the parent PID of the zombie process.

1 Like

check whether it is stopping your besclient service also

Very likely one of the processes you’re trying to kill, has besclient as it’s parent process.
Otherwise you probably need to do a bit more debugging on the shell script, possibly save its output to a lot file so you can see what it’s trying to do.

Also note that besclient doesn’t spawn the shell as a login shell, so it doesn’t process your dot-files and may be missing some environment variables or have a different PATH value. If you need some of those variables they launching the shell with the --login option.

2 Likes

Got it!!
It was somehow fetching the parent process of the “run chmod 775” command of the fixlet which is besclient and stopping it, hence stopping the service. Since I was using kill -9 command which forcefully terminates processes without acknowledgement, it wasn’t showing in the Logs that it is has received a client shutdown request. I used Wait instead of run and also added few filters in my Kill PPID function which eventually looks out for besclient processes it may catch and skips over it while killing them.

Thanks everyone for your replies.

2 Likes