Use `unique values` to collapse duplicate results to make expensive queries more efficient

See this relevance: https://bigfix.me/relevance/details/3017859

 unique values of (concatenations ":" of parenthesized parts of matches (regular expression "([0-9A-Fa-f]{2})[ :-]?([0-9A-Fa-f]{2})[ :-]?([0-9A-Fa-f]{2})[ :-]?([0-9A-Fa-f]{2})[ :-]?([0-9A-Fa-f]{2})[ :-]?([0-9A-Fa-f]{2})") of it) of unique values of matches (regular expression "(([0-9A-Fa-f]{2}[ :-]?){5}([0-9A-Fa-f]{2}))") of unique values of values of results of bes properties whose(name of it as lowercase contains "mac address")

One thing you may notice is that it uses unique values 3 times within the relevance to collapse the duplicate results. You could actually only use unique values at the very end (or in this case the beginning, since relevance goes from right to left) and get the same result, but it would actually be slower!!!

Why would doing something 2 extra times, that does not actually change the final result be faster? Well the reason is that the unique values operation is significantly faster than the matches (regular expression "ā€¦ operation. By using unique values multiple times, Iā€™m reducing the number of times that the regular expression operations need to be performed, thereby speeding up the query.

This is actually a problem if you wanted to know the exact number of occurrences of the results by using multiplicity of unique values because you can only calculate unique values once to get an accurate count of the number of occurrences. The reason this is true is the other uses of unique values will hide duplication from the final count. There are cases where this may be exactly what you need, while in others it is exactly NOT what you need.

It is also worth noting that if the total number of items being queried is small and the duplication is small, then over use of unique values could actually make the query slower by adding unnecessary steps, though I will say this is not something I worry about much when using unique values sparingly since it tends to be rather fast to execute.


Related:

1 Like