Session relevance in batches

(imported topic written by ae_phoenix91)

I have a session relevance that pulls many properties all computers in BigFix to be used to feed another system. The issue I’m facing right now is that the result set is too large to be processed by the other system. Is there a way that I can limit the results to batches of 100 at a time? I would equate this to an “IN” clause in SQL. I would first query BigFix for a listing of all computer names, then issue the session relevance for 100 computers at a time. I’m not sure if this is the best way (or even possible) though. I attached the relevance for review. Any help is appreciated.

(imported comment written by BenKus)

Hi ae_phoenix,

That is quite a query…

The short answer is that you cannot “chunk” a relevance query automatically… You will either need to:

  • Try to modify the relevance expression to chunk manually… For instance, you can change ‘… bes computers …’ to ‘… bes computers whose (id of it mod 10 = 0) …’ Then you can do the same query ‘… bes computers whose (id of it mod 10 = 1) …’ and so on (chunking it into 10 different queries… You could also separate by computer name or other attributes.

  • Get the data in memory in one big query and then break it into chunks in the application you are using to get the results.

  • Write the data to a file in chunks and insert it into the other system in chunks.

Ben

(imported comment written by jessewk)

That is quite a query. How long does it take to run?

You might get better performance for the query itself by making ‘bes computers’ the last tuple item instead of the first.

As a general rule for performance, you always want to structure tuples such that, given n items, items n-1 are singular and only the nth item is plural.

(imported comment written by ae_phoenix91)

It takes about 6 seconds to process 100 computers. This is in our development system. I needed to break it into smaller chunks since our production system has around 10,000 computers.

Thank you for the performance tip. The reason this query looks like it does is because I’m pretty new to writing session relevance and used the Excel connector to write this. I will try the performance tip as well as the mod function and see if that gets me to what I need. Thank you.

(imported comment written by gdaswani91)

jessewk - thanks for the tip.

I’m doing something similar (to feed HP Asset Manager), but in our case - we have over 220,000 computers that needs to be processed daily.

I was definitely surprised that BigFix does not support some sort of paging mechanism in their web service. What i’m planning on doing is querying the BigFix web service every hour for machines ( last report time of it > now - 1*hour) and pulling data in parallel (multiple / simultaneous web service calls with each one returning a different property set). I’ll then combine the records in memory (using the “computer id” as the joining key) and write it out in batches (also multi-threaded). The “MOD” approach by jessewk will most likely lead to simpler logic (i’ll also try it out).

If you have access to some J2EE developers, take a look at

Spring-Batch

  • it does alot of the heavy lifting.

BigFix tells me their reporting server caches alot of data - I guess i’ll find out what happens to it when its hammered via multiple requests with each returning a size-able chunk of data.

(imported comment written by bolson5591)

Did you have luck finding a way to import the 220,000 records into HP Asset Manager in a timely manner - as we are having the same issues. Thanks in advance…

(imported comment written by BenKus)

Hi bolson,

What exact problem are you running into? Does the query take too long? What exact data are you importing?

Ben