OSD WinPE cannot connect to server - Connection refused

Hi all,

I’m new to BigFix and having difficulties getting OSD network boot media to work.

I keep getting a “connection refused: invalid password” error when booting endpoints to the network boot media.

I’ve tested this from several endpoints on different VLans using both the servers IP and DNS names. I’ve tried setting a few different passwords in the media creation wizard, but no change. Is this password supposed to be just for the boot media or is this a password that has to match something set previously on the server side?

I feel I must be missing an important step of configuration/setup from the server side, but being new to all this I can’t figure out what.

Any help appreciated.

Cheers

Hopefully you can get a better reply than this, but maybe this can point you in the right direction. That looks like an rbagent password issue to me.

When you first set up the OSD server(s), you set a username and password for the server. This can be used when logging in to the Web Interface (if you have it turned on), and also is used for communication between the rbagent and the rembo server.

I think the configurations are under \Program Files\Common Files\IBM Shared, but don’t quote me on it. Under there you should find an rbagent.conf and a rembo.conf. The rembo.conf has an encoded version of the password, which should match the password in the rbagent.conf. And you should try to make that password the same across all of your OSD servers as well, or you’ll end up in situations where you can’t use the same media across servers.

If I recall correctly, there was a problem in some version or other of the OSD installer, where if the soon-to-be-server already had an rbagent.conf (with an old password from whichever OSD server originally loaded it), the installer did not replace that rbagent.conf with a newer version made for the new server; and may keep the source server’s IP address embedded in that rbagent.conf as well which causes all sorts of weirdness (like a sync action directed at the new server, taking effect on the older server because rbagent.conf pointed at the older server)

anyway… find the rbagent.conf and rembo.conf and make sure they match up.

Thanks for the reply Jason.

I’ve checked both those files on my OSD server and the passwords do match.

Any other idea’s?

Cheers

I’m still grappling with this issue.

The next step is to remove the Bare Metal service and re add it, but if anyone has any other suggestions before I do that I’m all ears.

Cheers

Hi.
Jason idea was indeed good. But the password file in question i suspect is the one that is generated and placed on the deployment media.

When you are creating offline boot media you indicate the server passwod in this dialog

Hope this helps.

Just a quick update to say thanks for all the help. We got it working by reinstalling the Bare Metal service on the server and setting the passwords to match.

Was just a simple case of not understanding how this worked coming from an SCCM background where the boot media can have it’s own password set independently of the server.

Cheers

Hi All,

I’m reopening this topic as this issue is still occuring, but now with a twist.

We have 2 main models of desktop in our environment, the Dell 7050 and 3040 both SFF.
We are able to use USB Network boot media to boot both models successfully anywhere on the network.
We can use PXE to boot the 7050’s anywhere, but the 3040’s only seem to PXE boot and successfully load the profile selection screen in certain network locations.


This is the error seen. The machine has network connectivity, IP address etc. and can ping the OSD server without issue.

We have not been able to isolate why some network locations work and others do not.
If a 3040 is PXE booted and fails, and is then moved to a location where PXE booting normally works, it continues to fail with the above error. It’s as if the problem ‘sticks’ with the machine after it has had the first failure.

I have tried deleting effected machine records from the database without success. We have also seen this issue occur for all 3040’s previously which was fixed with an update to the WinPE NIC drivers, however this only ever fixed it for the 3040’s in the network locations where it continues to work now.

Does anyone have any idea what could be causing this strange, very specific issue?

Cheers

It looks like “connection refused: invalid password” is just cut off the top of the screenshot. Do you still have copies of OSD servers or boot media with differing passwords floating around?

You can retrieve a hash of the passwords from rembo.conf on the server and from rbagent.conf on the client. The hash values should match each other.

In some configurations your endpoint can get a copy of WinPE downloaded to them and then boot from the local disk into WinPE. If that’ happened then an rbagent.conf with incorrect password could have been cached on the endpoint, which might explain why the problem “sticks” when you move the machine.

Strongly recommend that if you have multiple OSD servers, you ensure they have the same password. I’ve had to force a password sync on some of mine by copy/pasting the password hash into rembo.conf and/or rbagent.conf and restarting the bare metal server.

Thanks for the reply Jason.

I found the rembo.conf, and the rbagent.conf exists on a working machine (with the correct password) however I can’t seem to find it on a machine that is not working. It does not seem to exist or get created.

Is it possible that a different WinPE is somehow getting downloaded for one particular model on one particular subnet even though we only have the one OSD PXE server?
The IP address of the OSD server it is trying to connect to is correct so it’s not going elsewhere.

Cheers

In WinPE, it can be useful to hit CTRL-C to break out of the startnet.cmd batch file and examine the environment. There should be an rbagent.conf in the WinPE environment that you can check.

You can resume rbagent by running “startnet.cmd” or open a second command peompt first via ‘start cmd.exe’

Thanks Jason.
Yes, this is how I’ve been accessing WinPE.

What I’m saying is that under X:\ on a working machine there is a file ‘rbagent.conf’ with the OSD server IP:hashed password, however on a non-working machine this file does not exist.

What is the mechanism that creates the ‘rbagent.conf’ file? Should it already be a part of the WinPE image or does it get generated somehow after PXE booting?

I should also add, I tried manually creating the file on a broken machine, re-running startnet.cmd and it still does not work.

Cheers

I’m really not sure how it gets creates (I’m a customer, not IBM/HCL). If I had to hazard a guess, I’d think it’s a difference between “network boot media” and “offline install media”.

At least now you can tell it’s not random, and get the usb stick out of circulation.

Thanks Jason.

I realise you are not IBM, and really appreciate your help.

Not sure if I was clear above, but this issue is occuring only with PXE booting. There is no USB boot media that causes this problem. In fact using USB network boot media is our current work around and looking more and more likely to be the mechanism with which we roll out Windows 10 to our fleet of 5000+ machines in leiu of PXE not fully functioning.

Trying to tweak things further yesterday with drivers I somehow managed to break PXE booting for all 3040’s now…hopefully the fix for that ends up fixing the other network specific issue.

Cheers

Oh, that really is strange that PXE is not working but boot media is. I usually see problems the other way around - PXE works but network boot media fails.

When booted via PXE, if you break out of the startup script and drop to the shell, are you getting an IP address, able to ping your gateway, etc.?

Yes, break out of the shell and, everything seems fine from a network connectivity perspective. I can ping the gateway, OSD server fine. A tracert from a working machine and a non-working machine is identical.

I think the error is misleading, as I have managed to effect changes to the problem by changing drivers for 3040’s in the past. The strange part is how it only works in some network locations and no matter what I do to the drivers, those same network locations never work via PXE for 3040’s. It works fine for USB however.

In desperation, I’ve just tried binding every Realtek based network driver I can find in my Drivers library to the 3040 for WinPE 10 x64 and am hoping for the best.

Cheers

Hmm. If it’s any consolation it looks like you’re not alone, as far as Dell 3040’s having trouble in WinPE.

One thing to note that burned me, is that in OSD there is only one of each version of WinPE available at a time. For example if you’ve recently made a WinPE 10 build 1803 or 1709, that replaced any earlier WinPE 10 on your bare metal servers. Even if you keep an older Deployment Resource around, the WinPE would have been replaced. Is the WinPE you get from PXE the same version as your boot media version?

Well it looks like my last ditch effort adding all the drivers actually worked!

I ended up binding a total of 8 drivers, some of which are duplicates and now it’s working.

Why on earth the problem manifested itself only on certain networks still baflles me, but hopefully this helps someone else with the same problem.

Cheers

1 Like

Hi,
“added by user” drivers should not be used for PCI drivers that should instead bound to the PCI device they refer to.
Adding more drivers for the same device could make it not to work for conflicts. I suggest to try different drivers for the network card, one by one, binding it to the device, until you find the right one, and remove all the drivers “added by user”.

Yes - aware of this.
As it only effects one model, only on the WinPE image and we’ve had such issues getting it working I will just be leaving it unless there is a reason to revisit now.
Yes it’s possible that I only needed one of those drivers to make it work, but I did previously add a number of these same drivers without success which is why I just went for it adding them all.

If it works without issue I’m not going to touch it again now. :slightly_smiling_face:

Cheers

Wonderfull to read, It will help to configure.