Relevance to retrieve a set of computer objects with duplicate names

Hi!
As in subject, I’ll need to retrieve a computer set (not a list of names) with a duplicate property

Here’s my relevances to get names of computers:

(it) of unique values whose (multiplicity of it > 1) of names of bes computers

or

((unique values of values of results ( (bes property whose (name of it = "Computer Name"), bes computers))) whose (multiplicity of it > 1))

Thanks

1 Like

So are you saying this is how you did it, or are you asking how to do it? Given that you are putting this in the “Tips” category, I’m assuming this is an answer rather than a question.

This seems to be how to find the set of computers with duplicate names. Is that correct?

I’m sorry.
These relevance give a list of STRINGS with names of computer.
I need a set of computer (bes computers) in order to use as an object and not to re-iterate on total computers again avoiding very ling response time (my environment is composed by near 100,000 computers)

Sorry for my bad english

1 Like

Here is an idea for you.
Looks like you wanted to get Computer objects, rather than just computer names as strings.

This statement will give you a set of Computers:

set of bes computers whose (name of it as lowercase is contained by "|computer1|computer2|computer3|")

This is your original statement slightly modified:

concatenation of ("|" & it) of unique values whose (multiplicity of it > 1) of names of bes computers

If we combine the 2 statements, you can should get what we are looking for:

names of elements of set of bes computers 
    whose (name of it as lowercase is contained by (
                concatenation of ("|" & it as lowercase) 
                of unique values whose (multiplicity of it > 1) 
                of names of bes computers
                )
    )

Thanks Lee!
This is a great solution and we already tried in our environment; however it takes tenth of minutes due to the number of our computers (about 100,000).

Haven’t read it in detail … yet :smile: but I’d say this is because of recursion in the relevance.
So we’ll need to tweak the relevance to get around that - should be possible though.

1 Like

Out of curiosity, what is the use case here? What are you trying to achieve with this set of computers with duplicate names?

1 Like

You’re right :smile:
I need several properties of set of computers with duplicate names

This will return the bes computer objects

items 0 of it whose (name of item 0 of it = item 1 of it) 
of 
(bes computers, unique values whose (multiplicity of it > 1) of names of bes computers )

This removes the recursion and it’s not returning a set but you can get access to all the computer properties like this

(name of it, id of it) 
of 
items 0 of it whose (name of item 0 of it = item 1 of it) 
of 
(bes computers, unique values whose (multiplicity of it > 1) of names of bes computers )

...

ctnr.ub14.emmx-010.2, 5220398
ctnr.ub14.emmx-010.2, 9950220

Shout if you’d like an explanation of how this type of relevance works :slight_smile:

Thanks for the reply!
I saw the message and immediatly I evaluated the relevance in my test environment (about 800 computers): response time 2 seconds :wink:
Then I runned it on our production environment (100k clients) and it’s still evaluating after 25 minutes :frowning: … and it’s a 16 core machine with 128GB of RAM :blush:

can you try these two bits of the last clause individually to see if one of those is the culprit ?

(bes computers, unique values whose (multiplicity of it > 1) of names of bes computers )

UPDATE
Just tried the individual queries - they run fine with 250,000 endpoints.
Seeing it slow down when the two queries are used to form the tuple, even when there’s only one entry on the right of the tuple

Refactored to this and with 250,000 endpoints getting sub second response.

Seems like constructing the tuple using the two bes computer queries in the same tuple was the problem.

I’ve not duplicate names - but if I do put in some computer names for the last clause it works too.

(name of it, id of it) 
of 
items 0 of it whose (name of item 0 of it = item 1 of it) 
of (bes computers, it) 
of (unique values whose (multiplicity of it > 1) of names of bes computers)

This seems like one of the better options, but the size of the tuple will grow significantly given a larger set of duplicately named computers.

In fact, the size of the tuple will be:

(number of bes computers) * (number of unique values whose (multiplicity of it > 1) of names of bes computers)

Try this:

(name of it, id of it) of items 1 of (it, bes computers) whose(name of item 1 of it is contained by item 0 of it) of ( set of unique values whose (multiplicity of it > 1) of names of bes computers )

The size of the tuple that this relevance builds will never be larger than:

(number of bes computers)

This post is related:

While dividing the query did work as someone suggested, I was looking for a more “elegant” approach since I have to use this code extensively across multiple queries.

My original query didn’t complete at all despite removing the timeout (about 20 minutes when the server did not crash); gearoid query was a HUGE improvement taking just 40 seconds!
But the real surprise came with jgstew solution: it runs in just 2 seconds!!! AWESOME!

Really thanks to everyone of you guys!
You made my day! You rock! :wink:

2 Likes

I’m glad that it worked!

Thanks for the feedback. I was only able to test it across a small subset of machines in a much smaller BigFix instance, so I really didn’t know for sure how fast it would work.

It is really a challenge when trying to write nested relevance like this. You often have to start on a smaller/test instance just to get something working, and then optimize it. You then need to test the optimized relevance on a larger instance to see what impact you are making with the optimizations because you can’t always tell in a BigFix instance with very few clients.

The real trick to this is that recursive relevance or multiple nested relevance can be very expensive. Using tuples can be more efficient and avoid some of the repetition of relevance, but the actual process of building the tuple and having very large tuples can also be expensive, even though @gearoid 's and my tuples contain effectively the same data, the number of actual entries in the tuple matters.

Can you try this for me:

(name of it, id of it) of items 0 of (bes computers, it) whose(name of item 0 of it is contained by item 1 of it) of ( set of unique values whose (multiplicity of it > 1) of names of bes computers )

This should either be exactly equivalent to my relevance above, or it should be slower, and I’m not actually sure which. It could also be slightly faster, but I’m doubtful of that. It would help me to know which is more efficient to know the best option.

I just finished my write up about comparing many of the different options: The way Session Relevance statements are written matters (Part 2) Computer Names

Anyway to do this with a property that is not the computer name?

I’m looking for folks who have duplicate UUID’s or other values. This works for computer names (great by the way), but would love to use it for so much more!

Any help is appreciated.

If computer has only 1 value per property result:

(name of it, id of it) of computers of items 0 of (results of bes property “User Name”, it) whose(value of item 0 of it is contained by item 1 of it) of set of unique values whose (multiplicity of it > 1) of values of results of bes property “User Name”

Otherwise try this:

(name of it, id of it) of computers of items 0 of (results of bes property “User Name”, it) whose(exists elements of intersection of (set of values of item 0 of it; item 1 of it)) of set of unique values whose (multiplicity of it > 1) of values of results of bes property “User Name”

1 Like

This is great, just one other caveat - any way to filter out the “active” entries of the duplicates and just leave the actual dormant one?

Ex. let’s say you have a bunch of machines that for some reason have had their ComputerID reset and there is one active (with the new ID) and one dormant (hasn’t reported since the reset and that could’ve been 5 mins ago or 30 days ago), so if we can filter out the active ones we can just feed them back to a script a RestAPI script and auto-delete the duplicates but leave the active ones).

Also, just in case anybody else is having similar situation, we were also interested in comparing the IPs of the machines, so that if you happen to have 2 machines with the same name but with different IPs they are not captured by the relevance (we have situations where servers are being failed over to cloud or live data migrations are happening, so both machines are online for a period of time):
(name of it, id of it) of items 0 of (bes computers, it) whose(name of item 0 of it is contained by item 1 of it) of ( set of (preceding text of first " | " of it) of unique values whose (multiplicity of it > 1) of (name of it & " | " & concatenation " | " of (it as string) of ip addresses of it) of bes computers )