LDAP Login issues post July 2023 patching?

Hello,

We are dealing with a mind numbing issue… not confirmed but since last weeks patching across the infrastructure, users are having issues logging into the bigfix console/webui.

All users are LDAP (Active Directory) users. We have a case open with HCL support but so far we’ve not gotten very far with support…

There are 2x domains setup, imagine EU.dom + NA.dom, each configured AD server is pointing to CN=EU or CN=NA.

User will be able to login fine one minute and the next they will get “incorrect username or password”… is the root issue, debug console logs and server audit logs state the same so no new info to go off of…

Support requested to not use backup AD servers, this was done, no difference…
I’ve used LDP on the bigfix server (all components windows, srv 2019), and I never get any query issues when contacting the same DC configured in Bigfix.

Anyone else seeing any issues?

Need a little more info on the setup…

  • Are these two domains part of the same Forest?

  • Are you using one “LDAP Directory” setup with a DC from one domain as primary and DC from another as backup?

If EU and NA are two completely separate Domains/Forests, they should each have a separate LDAP Directory set up in the Console. When logging in users should specify username@na.whatever or username@eu.whatever. (assuming their userPrincipalName attributes match the domain DN)

If NA and EU are two domains within the same Forest, you could use a single LDAP Directory setup in the Console, but all the directory servers you specify should be Global Catalog domain controllers, and you should specify the Global Catalog ports for connecting (3268 or 3269) (test connectivity to those ports first via LDP).

In your LDAP Directory configuration, does your server have a direct connection to the directory servers you are specifying, or does it go through some kind of load-balancer or LDAP proxy or DNS round-robin configuration? Based on your symptoms I would guess that authentication works when you hit some particular DC but does not work when the server tries a different DC, either because it’s blocked or because users from one domain cannot auth when the server chooses a DC from the other Domain

Hi Jason,

Each domain is setup as it’s on “LDAP Directory” in Bigfix, oddly enough, in testing when it works users can specify NA.DOMAIN\user, user@NA.DOMAIN or even simply USER without specifying the domain, when it does not work, no combination works. We’ve asked users to only use USER@NA.DOM specifically but the result remains the same when it fails…

The LDAP directory is pointed towards a specific unique dc server, not a ldap proxy/round-robin configuration.

Right now all we can do is guess because it seems bigfix lacks the needed debug logging here…

Adding some info,

at this specific site we have bigfix 9.5.12 running (old bigfix) and 10.0.9.
Both systems point to the same DC, logins sporadically fail only on 10.0.9… and it seems only since the July patches.

Something else to mention, some users are using their windows session creds, most are not, for users who do use their windows session creds, if they check the box to do so, they never have a login issue… what is the difference between specifying user/pass vs the checkbox, at a base level?

The “use Windows credentials” checkbox performs a Kerberos authentication, using the existing login session’s Kerberos TGT to obtain a session ticket to the BigFix service. LDAP is not involved in this authentication.

Entering a username & password, performs an LDAP BIND authentication between the root server and the LDAP server.

Hi Jason, appreciate the info on that point

For July, Microsoft started enforcing stricter controls on Kerberos and NetLogon RPC, but I haven’t heard any indication this should affect BigFix. Earlier registry keys to disable these protections should no longer be effective. And if these were the issue I’d still expect consistent behavior - either the login should work, or it shouldn’t. Have you already followed Support’s advice of remove your backup LDAP servers from the configuration so you’re only using one server for each domain now?

https://support.microsoft.com/en-us/topic/kb5020805-how-to-manage-kerberos-protocol-changes-related-to-cve-2022-37967-997e9acc-67c5-48e1-8d0d-190269bf4efb#timing

https://support.microsoft.com/en-us/topic/kb5021130-how-to-manage-the-netlogon-protocol-changes-related-to-cve-2022-38023-46ea3067-3989-4d40-963c-680fd9e8ee25#timing5021130

Hi Jason, yes working with support they requested we remove all backup DCs, this has been done, thank you.

I am testing and looking with wireshark on my end to see if something comes up dirty.
One thing to note, 9.5.12.0 didn’t support SSL/TLS to AD, regardless this was working fine for atleast two weeks before issues appeared Monday (after patch weekend).

I appreciate your feedback

For whatever it’s worth, my lab setup is using a Windows Server 2019 domain controller that had the July rollup installed on July 18 automatically by my Patch Policy schedule. I haven’t seen any problem with LDAP logins on my root server running 10.0.9

I did code up a short test in Python, to send 50 logon attempts using REST API and report the statistics. If that sounds useful at all to you feel free to grab a copy at https://github.com/Jwalker107/BigFix/blob/master/API%20Samples/api_logon_test.py

Note - by default this tries 50 API logons with 0.1 second delay between each attempt. If you leave an incorrect password in the script it will very likely trigger account lockout on the Domain. Because these are independent logons (no persistent Session is reused), these will show up as separate logon attempts in server_audit.log

Hi Jason,

Much appreciated the continued support here, for now we’ve simply disabled SSL while we build infra to test this specific issue, thank you.

We built a new server, it’s not patched to July, we are seeing the same sporadic issues when logging in.
We will need to see exactly what is happening on the DCs, I’ll keep posting updates here in case anyone else runs into this in the future :confused:

2 Likes

Update on this,

We enabled BESrelay verbose logging on the root server and we are now seeing a binding on 636 on the FDQN of the domain and not the DC, ie: costco.local:636 vs dc1.costco.local:636

We were never seeing an actual connection attempt/denial on the DC configured in Bigfix, we were seeing traffic on the 3269 port.

At this point we will chuck it to being an issue on the infra and not much we can do with Bigfix at this point.
The “crazy part” though is that it’s sporadic… so coinflip on each connection attempt and you sometimes hit the same DC and are happy I guess?

Fun times!