Limiting scope for damage when using REST API

seanscriv · August 24, 2017, 2:38pm

We are developing an integration with Bigfix that will allow us to disable Windows endpoints that are reported missing or stolen. Our plan is to have a group in Bigfix with an action assigned to it, such that when computers are added to this group, the action will apply to prevent the stolen computer from accessing data on the system or network. The action will remove the binding to Active Directory and remove all user accounts.
We are going to control this via our asset system in Salesforce via the Bigfix REST API. Once an endpoint is marked as stolen in Salesforce, Bigfix API will be told to add that computer to the Bigfix lockdown group.
The problem we are anticipating is inevitable bugs in the code that may cause unexpected things to happen. Our somewhat paranoid, but well-placed, fear is many, or all, computers (10’s of thousands) ending up in this group, or some other unexpected command sent to the API which could harm our environment. We will be implementing redundant safety checks in the code to help ensure this doesn’t happen, but ultimately we are looking for a way to limit the damage somehow in Bigfix. Are there any ways we can do this? Such as limiting the number of computers that can be added to the group, or in the action, or relevance somehow? Any ideas greatly welcomed.

Thanks,
Sean

cmcannady · August 24, 2017, 4:40pm

Sean,

I get where you’re going and it sounds really cool, but I don’t believe that the platform + REST API would support your limitation requirements. For native support, it may be necessary to submit a RFE.

That said, you could add a time-delay to the group membership or an intermediate “review” group to create an additional layer/step before the automated actions you identified would be executed on the targeted endpoints.

I’m very interested to hear how this integration progresses in your organization. Definitely a cool BES REST integration.

Best,
@cmcannady

steve · August 24, 2017, 11:03pm

As mentioned, BigFix does not provide limiters like you’re describing, but you could add a delay or second factor of validation to your action/group membership to further minimize the risk. Like maybe a secondary tag has to be changed on the endpoint to trigger this process. So one action moves it into the group, and a separate action tags the endpoint so it can begin doing the lockdown/cleanup. That action could be triggered manually, or just additional logic in the automation that looks at a different source or arrives at the same conclusion differently.

Ideally, some additional validation step could be done on the endpoint to confirm it has been stolen. Like it hits an external web service that confirms, or maybe attempts to access some corporate resource and fails. The relevance (or actionscript) could check that validation step before proceeding with the actual cleanup.

seanscriv · August 25, 2017, 11:08am

Thanks for your suggestions guys - I’ll definitely look into them.
One of the requirements is to lockdown a computer as immediately as possible, so that would preclude any steps which require more time, such as waiting for a computer to report into Bigfix again.
The process of adding a computer to a group seems precarious via API because there is no easy way to do it with the available REST commands - and of course complexity = risk. Instead of the API providing a simple method like “addComputerToGroup”, one must download the group membership in XML, parse that XML to build a list of current members, rewrite the XML to include the new computer, and send it back to Bigfix to overwrite the original group. Makes me feel a little uneasy! Validation steps in the code will definitely help, but some kind of limiter set inside Bigfix would be way preferable. My gut instinct was that there’s some kind of relevance that could do this, but alas I cannot think of any.

Thanks

JasonWalker · August 31, 2017, 10:27am

Understanding your client will only execute your “stolen-laptop” process if it is online enough to see that it has been added to the “stolen-laptop-group”, I don’t see that this is the best approach.

Rather, I think you would need an action targetted at your Domain Controller (or an LDAP-executing proxy for it) to reject the stolen client machine if it tries to connect again; or else a “dead-man’s-switch” on client.

That said, you could put in an alert in Web Reports or a fast-running API to either warn, or automatically stop, actions where “number of reported computers of it” is higher than some “safe threshold”.

josh.pena · September 1, 2017, 12:20pm

I agree targeting the DC would be a more sound approach. Also, for your limitation, you could simply have the API return a count of the number of systems and have it take some action (e.g. email) if the number is either higher than a hard-coded limit, or if it has increased by more than X in the past Y hours/days.

seanscriv · November 15, 2017, 4:19pm

In relation to the above project to use the Bigfix REST API to trigger an action to missing or stolen computers that will disable them (lockdown), our plan has been to have a group in Bigfix with an action assigned to it, such that when computers are added to this group, the action will apply to prevent the stolen computer from accessing data on the system or network. What we are finding in testing is that it is taking a long time for the lockdown action to apply. It would appear that the computer checks in with Bigfix the first time and at this point it only gets added to this Lockdown Bigfix Group. It would appear that the computer needs to check in again (10-15 mins later?) for the action to actually apply (the action’s relevance is for it to be a member of this Lockdown group). Is my theory correct? If so, is there any way to speed the process up? The problem that a computer (that’s likely stolen and in criminal hands) needs to stay connected to the Internet for too long (seems like it needs to check in to Bigfix twice). We really need to have the action apply as quickly as possible. We are open to other ways of implementing this.

Thanks

seanscriv · November 20, 2017, 4:41pm

@jgstew any thoughts?

jgstew · November 21, 2017, 11:49pm

Thanks for tagging me, I used to read everything on the forum, but I haven’t had the time lately, so it is genuinely helpful to tag me in something you want my thoughts on, or something you think I may be interested in.

This is definitely an interesting subject for me. I never used this, but I did make this related content in 2012: https://bigfix.me/fixlet/details/683

I think I have an idea of how to implement this and solve these issues, but I need to write it up.

jgstew · November 22, 2017, 12:16am

The way I would recommend doing this is to create a custom site for the lost computers. This site would have the automatic group that is all computers that are subscribed to the site. The site would contain analyses that would collect extra data about these computers that you wouldn’t otherwise want to collect from all systems. The site can also be restricted so that it is only visible to master operators and a select group that may need it. The site could also contain other fixlets / tasks / etc… that are lower priority, but the main remediation actions that you require would need to be in the master action site and target computers that are subscribed to this site so that a download of the site content isn’t required for the remediation actions to run because they can be policy actions that are already deployed. I would recommend that they have a delay built into them so that there is time between the computer subscribing to the site and them running as a failsafe. You’ll have to balance speed with the concern for false positives. Since the marking of systems to be included in this site and have remediation actions run on them should only be done one system at a time to prevent accidental execution on many systems, you can use a “key” and a hash of the bigfix computer id of the system to actually have it subscribe to the site. This would be done using an action that would set a client setting to the required value for a particular system.

This would be the relevance for the site itself:

exists values whose(it = sha256 of (it as string & "Hce6s7xV5WaQScyD7PpN") of computer ids) of settings "Remediation" of client

In this case Hce6s7xV5WaQScyD7PpN would be the key required to be used when marking a system as needing this by whatever uses the REST API to tag the system. It would likely be hard coded. The thing that tags the system just needs to know the bigfix computer ID or to query for it, then add the key to it and take the SHA256 of the result and then send an action that sets the resulting value to the client setting. The only ones that would have access to this key would be those that have read access to the custom site subscription relevance or those that could derrive it from whatever issues the REST API command, but not anyone who could see the action in the console.

The action that is issued to tag the system with the client setting required to do this could also set the BigFix client CPU usage to be higher (10% ?) and set it to enable command polling every 10 minutes. This would allow bigfix to work much more quickly on the system in general. That said, it is important to note that if the remediation actions are already deployed as policy actions to members of the remediation site, then only the tagging of the system is required, no other downloads or communication with the root server is needed because policy actions that are already on a system will deploy once their criteria is met automatically. The client will decide to do this on it’s own once it is tagged. The higher CPU usage will help the client figure out that it is now relevant faster, which would then make the remediation happen faster. If the remediation content requires small utilities to be downloaded to do some of the work, those could actually be pre-cached on all systems from the start in the bigfix client’s utility cache just in case they are needed in the future and to allow the remmediation actions to happen even if the bigfix client is offline but running.

Relevance:

not exists values whose(it = "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855") of settings "Remediation" of client

Actionscript:

setting "Remediation"="e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" on "{ parameter "action issue date" of action}" for client
setting "_BESClient_Comm_CommandPollEnable"="1" on "{ parameter "action issue date" of action}" for client
setting "_BESClient_Comm_CommandPollIntervalSeconds"="300" on "{ parameter "action issue date" of action}" for client
setting "_BESClient_Resource_WorkIdle"="80" on "{ parameter "action issue date" of action}" for client
setting "_BESClient_Resource_SleepIdle"="500" on "{ parameter "action issue date" of action}" for client

I have no idea, it would partly depend on the _BESClient_Resource_WorkIdle setting, but the important part is, that if the policy action already exists, it is the amount of time it takes for the client to realize the change, it doesn’t matter how long it takes to be reflected in the console and it doesn’t require a network connection once the tagging has occurred.

You could test the speed of it all by setting up the site, creating a policy action in the action site that targets members of this site but does not actually do anything other that write out the time to a file. It could also write out the result of now - subscribe time of sites whose(name of it = "Remediation")

This would give you a general idea of how quickly this can happen. You would also want to look at the flow in the console logs.

The real trick is getting the action out there that will tag the client. That will require command polling to already be enabled on the endpoint, otherwise it could take ~6 hours give or take unless the client is getting UDP notifications, in which case it should only be a few seconds. In general, mobile clients / laptops should have command polling enabled already at once an hour for everything to work best, regardless of this particular use case.

seanscriv · November 29, 2017, 2:55pm

Thanks so much for this information @jgstew - I really appreciate it! I will start testing this week. Cheers!

anon87818915 · November 30, 2017, 12:40pm

Might I suggest a different approach to the problem? From what I read, what you really want to do is protect the data stored on the device and prevent the machine from using any assigned (or remembered/inherited/etc) permissions to inflict further data loss / access / damage?

If that is the case, why not leverage something like BitLocker on the machines. This would ensure all data stored on the disk is stored in an encrypted state, but in the event the machine is stolen/lost/etc, you could leverage a similar REST API process but instead of performing destructive actions, simply set a pre-boot password of your choosing then forcing a machine reboot to activate it. That way, if a system is an accidental reciepient of the lockout action, you could simply tell the user what the password is to get it back online where you can then just take another action to clear the pre-boot password.

Limiting scope for damage when using REST API

This would be the relevance for the site itself:

Related:

Relevance:

Actionscript: