Delay in BESClient connecting to relay after refresh

darroch · September 26, 2024, 4:51pm

Hi,

I am seeing different results when sending a refresh to clients, is there a client setting to delay a ‘check-in’ after a refresh?

Example 1 (Fast)

Inbound Refresh:
11:09:59.971330 IP 11.100.110.40.64175 > 11.100.111.31.52311: UDP, length 21

11:10:22.231559 IP 11.100.111.31.57596 > 11.100.110.40.52311: Flags [S], seq 3824809577, win 29200, options [mss

Refreshes after ~23seconds.

Repeated:

11:54:37.961870 IP 11.100.110.40.53088 > 11.100.111.31.52311: UDP, length 21
11:54:59.701478 IP 11.100.111.31.58736 > 11.100.110.40.52311: Flags [S], seq 1874878431, win 29200, options [mss 1460,sackOK,TS val 877097013 ecr 0,nop,wscale 7], length 0

Refreshes after ~22seconds.

Example 2: (Slow)

Inbound Refresh:
15:05:49.319464 IP 11.44.1.80.52711 > 11.39.137.216.52311: UDP, length 21

Outbound Connection to relay:
15:19:44.054577 IP 11.39.137.216.45506 > 11.44.1.80.52311: Flags [S], seq 899643279, win 29200, options [mss 1460,sackOK,TS val 592378239 ecr 0,nop,wscale 7], length 0

14min after fresh

Above repeated:

Inbound refresh (plus an extra udp packet)

15:37:18.826947 IP 11.44.1.80.63924 > 11.39.137.216.52311: UDP, length 21
15:43:37.367040 IP 11.44.1.80.63954 > 11.39.137.216.52311: UDP, length 25

Outbound Connection to relay:

15:57:14.250984 IP 11.39.137.216.48526 > 11.44.1.80.52311: Flags [S], seq 3958611954, win 29200, options [mss 1460,sackOK,TS val 594628436 ecr 0,nop,wscale 7], length 0

20min after the refresh.

Example 3 (Slow)

Inbound Refresh:
11:45:11.057883 IP 11.100.19.78.58082 > 11.100.16.219.52311: UDP, length 21
11:45:25.541815 IP 11.100.19.78.57742 > 11.100.16.219.52311: UDP, length 25

Outbound Connection to relay:
11:46:52.731992 IP 11.100.16.219.33000 > 11.100.19.78.52311: Flags [S], seq 1254322808, win 29200, options [mss 1460,sackOK,TS val 1478685317 ecr 0,nop,wscale 7], length 0

There is a second udp packet after 14seconds (4bytes larger)

Then an outbound connection to the relay at 1m 41sec after the first udp.

Example 4:

Inbound Refresh:
11:18:55.534637 IP 11.100.19.78.59920 > 11.100.16.219.52311: UDP, length 21

Outbound Connection to relay:
11:22:55.865463 IP 11.100.16.219.58728 > 11.100.19.78.52311: Flags [S], seq 1227527142, win 29200, options [mss 1460,sackOK,TS val 1477248456 ecr 0,nop,wscale 7], length 0

JasonWalker · September 26, 2024, 5:21pm

Ideally you shouldn’t be sending manual refreshes to clients, unless you think something has changed and need them to report new results.

‘Force Refresh’ doesn’t mean ‘send a heartbeat now’. It means ‘stop all evaluations, request updates for every site, re-evaluatenall properties, and send a new Full Report’.

All the Analysis things that are tuned to run once an hour, once a day, once a week, because they can impact performance, are slow to calculate, or generate a large number of results, all have to be re-evaluated immediately so a new Full Report can be posted.

Using ‘Force Refresh’ on a few clients can slow down those clients’ next reports; a ‘Force Refresh’ to a huge number of clients can severely impact performance on the Relays and Root Server as they have to process the backlog of larger Full Reports.

darroch · September 26, 2024, 7:28pm

Hi Jason,

Many thanks for the reply, this was identified during some trouble shooting.

I am seeing a specific set of servers not taking changes, or taking along time to take changes (RHEL Clients, running v11.x). I have noticed that they are all 7 hops from the relay.

My question was more around the large differences between the working (almost instant) and the slower ones - is this expected behavior?

Regards,

JasonWalker · September 26, 2024, 7:43pm

I’d probably be checking the client logs. Specifically whether it is receiving GatherHashMV messages and ForceRefresh messages (if it doesn’t receive these quickly on new content or force refresh, UDP to the client may be blocked. I’d be checking firewalld on the client or firewalls/switches between Relay and Client.)

JasonWalker · September 26, 2024, 7:46pm

Ah, ok it took me a minute to find but check the tips at Tip: Troubleshooting Client Reponsiveness and please let me know whether they’re helpful!