Root api "server too busy"

We occasionally get the response “server too busy” when pulling back applicable content from a computer via the Root Server REST API:

GET https://root:52311/api/computer/computerID/tasks

Does the Root server do rate limiting of some sort? If so, what are those thresholds?
Is there anyplace where these events are logged?

Hi cstoneba,
If this happens to you a lot, it would be useful if you could collect the deadlock log (SQL Server: Turn On Deadlock Trace Flag & DB2: Collecting data: DB2 Deadlocks ). Also I think your problem could be related or similiar to this one https://www.ibm.com/support/pages/users-logged-webui-are-suddenly-logged-out. Take a look and let me know if you need something else.

1 Like

Hi, i am seeing some deadlock errors in the filldb.log.

Fri, 04 Oct 2019 22:01:36 -0500 -- 2416 -- Encountered error during long property results update: Database Error: [Microsoft][SQL Server Native Client 11.0][SQL Server]Transaction (Process ID 61) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. (40001: 1,205) (RESULTS LOST)

Fri, 04 Oct 2019 22:01:36 -0500 -- 2416 -- Error storing reports: Database Error: [Microsoft][SQL Server Native Client 11.0][SQL Server]Transaction (Process ID 61) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. (40001: 1,205)

I’ll have to investigate those in SQL. But you think they would cause the Root server API to respond with “server too busy”?

Most likely yes, let me know what you find in the SQL deadlock log

@ 2019-10-11 22:05:00.247 :

spid 61 is being blocked by spid 195 and inturn spid 195 is being blocked by spid 61 and 61 is chosen as deadlock

spid 61 query:

merge into LONGQUESTIONRESULTS as q using (values(@P1,@P2,@P3,@P4,@P5,@P6,@P7,@P8,@P9,@P10)) as v(siteid,analysisid, propertyid, computerid, isfailure, isplural, resultscount, resultstext, reportnumber, webuisiteid) on q.SiteID=v.siteid and q.AnalysisID=v.analysisid and q.PropertyID=v.propertyid and q.ComputerID=v.computerid when matched then update set IsFailure=v.isfailure, IsPlural=v.isplural, ResultsCount=v.resultscount, ResultsText=v.resultstext when not matched then insert(SiteID,AnalysisID,PropertyID,ComputerID,IsFailure,IsPlural,ResultsCount,ResultsText, WebuiSiteID) values(v.siteid,v.analysisid,v.propertyid, v.computerid,v.isfailure,v.isplural,v.resultscount,v.resultstext, v.webuisiteid);

spid 195 query:

DELETE TOP (@BatchSize) LONGQUESTIONRESULTS FROM LONGQUESTIONRESULTS L WHERE SiteID = L.SiteID AND AnalysisID = L.AnalysisID AND PropertyID = L.PropertyID AND ( NOT EXISTS ( select C.ComputerID FROM Computers C WHERE C.ComputerID = L.ComputerID ) OR EXISTS ( select C.ComputerID FROM Computers C WHERE C.IsDeleted = 1 AND C.ComputerID = L.ComputerID AND DateDiff(day, C.LastReportTime, GetUTCDate()) > @InactiveDays ) )

It looks like the BES Computer Remover is running. If yes, it should run when the Filldb is more unloaded and to not specify a big value for the batch size in order to avoid deadlock.

btw, this is for hcl incident CS0052264

Yes, we have the BES Computer Remover running every 6 hours with a batch size of 500000. This BigFix deployment/SQL is used very heavily so it is not known when there is a low use window period.
Think that batch size is too large?

Yes, I think is a quite large value.

do you have recommendations?

The default batch size is 10000 and works well for most customers. The computers specified in the filters still get removed, it is just that they are run in several smaller batches so other queries don’t get blocked by database locking as long.

1 Like

Any reports a person could run that would help in this area to really know where it should be?

we’ve set the batch size for the scheduled Computer Remover and the scheduled Audit Trail Cleaner from 500,000 to 10,000 and we’ll see if that helps with the deadlocks.
thanks

1 Like

We’re still getting BES Root API responses where there is no relevant task list for a computer ID, yet we have no deadlock occurrences in the SQL logs.
Any other suggestions?

I think you should work through the support ticket, so they can look at your specific configuration.

1 Like

I am doing that in parallel. Nothing there yet though.

hi @cstoneba ,
when you drive the REST API, do you know which is the concurrency degree of the requests?
I.e. how many requests are you driving in parallel.

it’s a little hard for me to tell but when I look at my root servers server_audit.log, I just picked a time and there are 94 API Connections within a 1 minute window.

I think we have this resolved. “server too busy” was just one symptom of the root problem. Our DSA SQL server was out of disk space so when it tried to replicate from the Primary Root server every 2 hours, it caused a db lock on the primary SQL. That in turn caused things like the filldb to fill up, webreports to hang, BES root server service API to hang, etc.

2 Likes

Thanks so much for posting the update! I’ve been trying to get my head around what could be a problem as I have some customers doing very heavy API calls without seeing that message.

(I hadn’t even thought about asking about DSA).