Error : PostResultsForwarder (HTTP Error 503): Parent relay is busy, backing off

Any one help me with this .
PostResultsForwarder (HTTP Error 503): Parent relay is busy, backing off.

All the machines are in Offline state along with the relay…

Kindly suggest

This happens when your server cannot write entries to the database faster than the requests from your clients are coming in. I’ve seen this problem a lot of times when did a “Force Refresh” on a lot of machines at once. With all the clients sending their reports at once, it was a lot of information for the core to process at one time so it would reject any reports until it had cleared it’s 3MB queue of things to be processed.

To confirm, check the relay diagnostics of your core server: https://your-core-server-name:52311/rd/

If it’s at 100%, new reports will not be processed until it is cleared. Also ensure that the connection between your server and database is active with no errors.

Hey @jmaple

how do we resolve this issue

Stop trying to refresh the clients and let the core catch up its reports. It just takes time for it to process everything but it will come through in the end. It just takes time.

Is there any way to manually say restart the service and do it.As we have lot of endpoints not reporting

Just because the console says they haven’t reported in so many minutes doesn’t mean they didn’t attempt to send reports. hence the error. The core just needs to catch up to the reports it does need to process and nothing can really happen until that completes. In my experience, it’s a waiting game until all reports are processed. Someone else may have a better answer?

You need to check the diagnostic pages of your root server and relays. There is also a performance log you can enable and check the speed of ingestion there.

You could have something offline or otherwise having issues, or it could just be that your root server is too slow at ingesting reports.

Send out a blank action to all endpoints and see if that shows up in any of the client logs. If so, then everything is working on the action side of things.

You might consider setting the client setting to limit the minimum report interval.

1 Like

Hi , what would be solution currently because still machines are not reporting… And same error in relay diagnostics .
Warning! The FillDB buffer directory is overloaded and new posts will not be accepted.
Can restart the filldb multiple times resolve this issue ?
Or Can i increase the filldb files size of Root server for a while ?

I have checked the filldb it increases to 9-10 MB and then clears it…so no stale file are there.
All relays are showing the same…but it clears for sometime and fills it again

Please suggest as my environment is not working and lots of activities are in queue.

You didn’t provide us enough information to know if things were working until now.

If all of your relays & root FillDB buffer fills but then empties, then that means everything is working, it just can’t keep up.

I really need the answer to these questions to help:

  • How many relays do you have?
  • How many endpoints do you have?
  • Do you have top level relays? (what does your relay structure look like?)
  • What does the storage volume your SQL db is on look like?
  • How is your root server volumes structured in general?
  • Do you have at least a Gigabit between all of your relays and root?
  • Do you have 10gigabit between your top level relays and root?

You should increase the max size of the FillDB buffer a bit on your root so that it can do more in batches. Don’t increase it too much or it could make the situation worse. You need to increase the maximum size, the maximum number of files and the maximum time it will wait before processing a batch.

You definitely need to increase the minimum report time AND the minimum analysis interval.

Hello Guys…

Sorry for trouble. The Filldb was carrying the corrupt data which stuck and made the performance issue.

I have cleared it multiple times .now its working properly…

Anyways thanks for your suggestion.

Any suggestion on Corrupt Data ?

1 Like

@ swap041
How did you manage to find that Filldb running the corrupt data ? and how did you manage to clear the corrupt data ?

1 Like

Just simply change the preference from the console i.e increase the heartbeat interval , sending referesh interval and wait for sometime.

when all your incoming and waiting reports are processed you will start seeing the better results post that you can change the preferences again.