OSD install fails at rbagent rad-updateidlescheme

Ok, so I have several OSD servers that I’m installing. One of them is on an isolated network, it cannot reach any of our other OSD servers. I’m using the Bare Metal Server Manager dashboard to deploy this new OSD host.

The system has made it as far as “Post-Install Tasks” and appears to be stuck running the command
rbagent.exe -d -v 4 rad-updateidlescheme. There is nothing present in the logfile that it’s trying to generate.

So I’m trying to run the command line manually, and I am observing that it is trying to contact one of our existing TPMfOSD servers on port 4013. There is no network connectivity to the older TPMfOSD server though.

I allowed the command to run a long time, and eventually I have this output:
Bootable: \\.\PHYSICALDRIVE0 Connect X.X.X.X -> Y.Y.Y.Y Device \\.\PhysicalDrive0 (0:0) is a regular disk Hard disk geometry: ff 3f Non-USB: \\.\PhysicalDrive0 tcp_readdata: received 2920 instead of 16384, time elapsed 30000 tcp_readdata: received 2920 instead of 16384, time elapsed is 30001

Why is rbagent trying to reach one of my other TPMfOSD servers? How can I complete the installation without that occurring? This new OSD server is not going to have network connectivity to the other OSD servers.

In that post, “Y.Y.Y.Y” is the address of an earlier TPMfOSD server, that this OSD host will never be able to reach on the network.

On the new server I found \Program Files\Common Files\IBM Tivoli\rbagent.conf referenced this (incorrect) IP address. I updated it with the address of this new OSD server, and was able to run the rbagent.exe command line manually.

It’s a little tricky to do this in the middle of the installation though and let the rest of the multi-action group complete. We need the log generated by the BigFix task to have correct output our the multi-action group will fail. What I did was

  1. Run rbagent.exe manually and output to \program files\common files\ibm tivoli\updateidlescheme2.log
    (the updateidlescheme.log is still locked by the BigFix task)

  2. Use Process Explorer to determine the PID of the copy of rbagent.exe that BigFix is executing

  3. Use taskkill.exe /pid [RBAGENT_PID] & move /y updateidlescheme2.log updateidlescheme.log
    -> This replaced the updateidlescheme.log before BigFix new the rbagent.exe had been killed and the deployment task is now continuing.

So…why would a new OSD server have an rbagent.conf referencing an existing TPMfOSD server? Is this going to be a problem for me later?

I think this is because the server image was originally deployed using the old TPMfOSD server, and had an existing rbagent.conf pointing at that server. I think it’s a bug that the Bare Metal Server Manager does not create a new rbagent.conf when deploying a new OSD server…but the PMR site seems to be down right now.