Optimising relevance query for getting applicable and remediated computer counts

DeepHyde · October 21, 2020, 1:27am

How can I optimize this query to run faster? It takes 2 minutes to run with an environment comprising 12k computers. It queries for total number of applicable and remediated computers for all computers. I need to query this data within a date range which is req.query.dateLower and req.query.dateUpper. Can someone help me out?

(number of computers of results whose (relevant flag of it and last became relevant of it as universal date < “${req.query.dateUpper}” as date and last became relevant of it as universal date > “${req.query.dateLower}” as date) of bes fixlets whose ((fixlet flag of it or task flag of it or baseline flag of it) and name of it as lowercase does not contain “corrupt patch” and name of it as lowercase does not contain “superseded” and (Applicable Computer Count of it > 0 )), number of computers of results whose (remediated flag of it and last became nonrelevant of it as universal date < “${req.query.dateUpper}” as date and last became nonrelevant of it as universal date > “${req.query.dateLower}” as date) of bes fixlets whose ((fixlet flag of it or task flag of it or baseline flag of it) and name of it as lowercase does not contain “corrupt patch” and name of it as lowercase does not contain “superseded”))

JasonWalker · October 21, 2020, 2:41am

Ahh, a good problem to illustrate the issues with a Cross Product. The main performance factor I’d expect here would not be the number of computers, but the number of fixlets.

Here, you are basically retrieving every fixlet twice. The first time you retrieve every fixlet to count the computers that are still relevant to it; the second time, you retrieve every fixlet and count the computers that are not relevant, but once were relevant.

The issue then, is that every fixlet from the first group is paired to every fixlet of the second group.

To illustrate, consider this cross-product. Think of “a”, “b”, and “c” as “the first set of fixlets, that have relevant computers”, and “1”, “2”, “3” as “the second set of fixlets, that once had relevant computers”.

q: (("a";"b";"c"), ("1"; "2"; "3"))
A: a, 1
A: a, 2
A: a, 3
A: b, 1
A: b, 2
A: b, 3
A: c, 1
A: c, 2
A: c, 3
T: 0.121 ms
I: plural ( string, string )

The total number of results from this would be (number of relevant fixlets X number of once-relevant fixlets).

If you have one hundred fixlets in each set, there would be ten thousand results. Each of those ten thousand results is retrieving a count of the relevant results for the first fixlet, and the once-relevant results for the second fixlet - that’s a count calculation that would be performed some twenty thousand times.

What would be more efficient, is to retrieve each “interesting” fixlet one time, and then count both the relevant and once-relevant computers for each fixlet. That avoids the cross-product, by not pairing every fixlet to every other fixlet.

Something like

(
  number of computers of results whose (relevant flag of it and last became relevant of it as universal date < "${req.query.dateUpper}" as date and last became relevant of it as universal date > "${req.query.dateLower}" as date) of it

, number of computers of results whose (remediated flag of it and last became nonrelevant of it as universal date < "${req.query.dateUpper}" as date and last became nonrelevant of it as universal date > "${req.query.dateLower}" as date) of it

, name of it
) of bes fixlets whose (
     (fixlet flag of it or task flag of it or baseline flag of it) 
      and name of it as lowercase does not contain "corrupt patch" 
      and name of it as lowercase does not contain "superseded"
)

This will return a three-item tuple. For that hypothetical set of 100 fixlets, you would now have a total of 100 results in the form of

( # of currently-relevant computers, # of once-relevant computers, name of fixlet)

You also may want to add a whose () filter on the result tuple. A fixlet that is not now relevant, and never was relevant, may not be needed in the result. So you could trim that further to include only the ones where the “currently relevant” or “recently fixed” are not zero -

(
  number of computers of results whose (relevant flag of it and last became relevant of it as universal date < "${req.query.dateUpper}" as date and last became relevant of it as universal date > "${req.query.dateLower}" as date) of it

, number of computers of results whose (remediated flag of it and last became nonrelevant of it as universal date < "${req.query.dateUpper}" as date and last became nonrelevant of it as universal date > "${req.query.dateLower}" as date) of it

, name of it
) whose (item 0 of it > 0 or item 1 of it > 0 )

of bes fixlets whose (
         (fixlet flag of it or task flag of it or baseline flag of it) 
          and name of it as lowercase does not contain "corrupt patch" 
          and name of it as lowercase does not contain "superseded"
    )

You should note though that is is retrieving all of the fixlets, tasks, and baselines from all of the sites. The next optimization would be to reduce the number of fixlets/tasks that you’re examining. Rather than our hypothetical set of 100 fixlets, you’re likely actually checking more like thousands, and yielding millions of results now. Probably a lot more than you’d want. I’d suggest adding a filter on the the ‘bes fixlets’ to only retrieve the Sites that you need, such as
name of site of it = "Enterprise Security" for the “Patches for Windows” site. Otherwise you’ll also be retrieving everything from “BES Support”, “Master Action Site”, etc.
It’s also worth knowing that many Tasks will always be relevant, and usually don’t mean that the system is misconfigured or missing anything; a task for “Restart a service” for instance may always be relevant, so you might not want to report on the Tasks either.

edit: I had a copy/paste issue, and was counting the “remediated” computers twice and not counting the “relevant” computers at all. Fixed above.

DeepHyde · October 21, 2020, 2:47am

Awesome idea of adding a whose filter on the result tuple. I did try the solution of querying the fixlets once and getting the needed computers counts from it. But, the issue was it was always separating each computer counts for each fixlet instead of giving me a total amount for all fixlets… Like So:
[[12,12],[12,12],[12,12] ].
But I require that same dataset to result like so :
[36,36]
How can I achieve this?

JasonWalker · October 21, 2020, 2:56am

Ok, that’s an entirely different kind of problem then but the same principal applies - we need to avoid the cross-product of pairing every fixlet against every other fixlet.

We also want to get a total sum of all currently relevant computers for all fixlets, and a total sum of all recently-relevant computers for all fixlets, if I understand this correctly.

The key then to avoid a cross-product is to first build a set of the fixlets of interest, loop through the set to get the “currently-relevant” count, and then loop through the set again to get the “recently-relevant” count. We would want to ensure that what each item of the tuple returns are the total sums, and not keeping an item for each fixlet - just the sums, so there’s no cross-product.

Instead of processing the tuple for each fixlet, we’d use the ‘set of bes fixlets’ to make the main tuple only run once, but iterate over the whole set of fixlets within the inner loop…

I don’t have my server running right now, but I think this would do.

(
 sum of numbers of computers of results whose (relevant flag of it and last became relevant of it as universal date < "${req.query.dateUpper}" as date and last became relevant of it as universal date > "${req.query.dateLower}" as date) of elements of it

, sum of numbers of computers of results whose (remediated flag of it and last became nonrelevant of it as universal date < "${req.query.dateUpper}" as date and last became nonrelevant of it as universal date > "${req.query.dateLower}" as date) of elements of it

)
of set of bes fixlets whose (
         (fixlet flag of it or task flag of it or baseline flag of it) 
          and name of it as lowercase does not contain "corrupt patch" 
          and name of it as lowercase does not contain "superseded"
    )

The previous advice on limiting the set of fixlets/tasks/baselines you examine definitely still applies though.

DeepHyde · December 18, 2020, 3:10pm

Sorry for the delayed response, but thank you so much for helping me out. I learnt very important stuff through this post.