SQL Cluster patch scripting

so here’s our situation. first I’ll premise this by saying I’m not a SQL expert, and I’m still wrapping my head around the Relevance, PowerShell and xml needed here - I know enough to know I don’t know enough.

We have several SQL always-on clusters, where at the end of the process we need to have the DB residing back on the primary node - I know clusters shouldn’t need that but for reasons not relevant here, it’s the way they are for now.

our deployment engineer gave us two scripts he’d built for our clusters, one for the DR and Secondary nodes, and one for the Primary nodes. The first script seems to work fine, as it doesn’t need to do anything other than run the baseline and patch, etc etc.

the second script is - using fixlets, supposed to fail the db over from primary to secondary, patch from the baseline and then take the db back to itself, however it’s seeming to fail on the last step.

could anyone take a peek through this sanitized version and let me know what you all think?

--- CONFIG ---

$RootServer = "https://PLACEHOLDER.DOMAIN.COM:52311" # example: https://mybfserver.domain.local:52311
$Username = "PLACEHOLDER" # must have REST API access, master operators are implicitly allowed (real account does have access.)
$Password = "PLACEHOLDER"
$CustomSiteName = "Cluster SQL Patches" # an existing CUSTOM site you can write to
$BaselineTitle = "Auto - Patches for Windows Test/Dev Clustered Primary- $(Get-Date -Format yyyy-MM-dd)" # change title for secondary and primary
$MaxComponents = 150 # cap to keep the baseline lean; adjust as needed

=== SCHEDULING CONFIG ===

When should the action start? (LOCAL time on this box)

Example: tomorrow at 2am local

$ScheduleStartLocal = (Get-Date).Date.AddHours(18) # +17 for secondary +18 for Primaries

For how long should the action be active after it starts?

(Keeps the action open until Start + RunDuration; clients can run within this window.)

$RunDuration = New-TimeSpan -Hours 2

--- Helpers ---

Add-Type -AssemblyName System.Web
[System.Net.ServicePointManager]::SecurityProtocol = [System.Net.SecurityProtocolType]'Tls12'

Optional (self-signed root) - view BigFix docs on creating a self-signed cert to beef up security

[System.Net.ServicePointManager]::ServerCertificateValidationCallback = { $true }
[System.Net.ServicePointManager]::Expect100Continue = $false

$pair = "{0}:{1}" -f $Username, $Password
$basicAuth = "Basic " + [Convert]::ToBase64String([Text.Encoding]::ASCII.GetBytes($pair))
$headers = @{ Authorization = $basicAuth }

function Invoke-BESQuery([string]$relevance) {
$encoded = [System.Net.WebUtility]::UrlEncode($relevance)
$uri = "$RootServer/api/query?relevance=$encoded"
[xml](Invoke-RestMethod -Uri $uri -Headers $headers -Method GET -SkipCertificateCheck)
}

--- 1) Get the Patches for Windows site gather URL ---

$relSiteUrl = @'
url of bes site whose (name of it as lowercase starts with "enterprise security")
'@
$siteXml = Invoke-BESQuery $relSiteUrl
$SiteUrl = $siteXml.BESAPI.Query.Result.Answer.'#text'
if (-not $SiteUrl) { throw "Could not resolve Patches for Windows site URL." }

--- 2) Get relevant Fixlets (+ default action id + applicable count) ---

$relFixlets = @'
(

id of it as string
& "|" & (

        if (exists default action of it)

        then content id of default action of it

        else "Action1"

      )

& "|" & (applicable computer count of it as string)

)

of bes fixlets

whose (

fixlet flag of it

and (

    exists default action of it

    or exists action whose (content id of it = "Action2") of it

  )

and name of site of it = "Enterprise Security"

and (name of it as lowercase contains "sql" or name of it as lowercase contains "cumulative" or name of it as lowercase contains "security" or name of it as lowercase contains "update" )

and category of it as lowercase contains "update"
and category of it as lowercase does not contain "unspecified"

and name of it as lowercase does not contain "superseded"

and applicable computer count of it > 0

)
'@
$fixXml = Invoke-BESQuery $relFixlets

and (name of it does not include "office" and name of it does not include "outlook" and name of it does not include "onenote" and name of it does not include "word" and name of it does not include "excel" and name of it does not include "powerpoint")

Parse tuples from /api/query XML

$raw = @()
if ($fixXml.BESAPI.Query.Result.Tuple) {
foreach ($t in $fixXml.BESAPI.Query.Result.Tuple) {
$raw += $t.Answer.'#text'
}
} elseif ($fixXml.BESAPI.Query.Result.Answer) {
$raw += $fixXml.BESAPI.Query.Result.Answer.'#text'
}

$items = $raw |
Where-Object { $_ -and ($_ -match '|') } |
ForEach-Object {
$parts = $_ -split '|',3
[pscustomobject]@{
FixletID = [int]$parts[0]
ActionName = $parts[1] # typically "Action1"
AppCount = [int]$parts[2] # applicable computer count
}
} |
Sort-Object -Property AppCount -Descending

if (-not $items) { throw "No relevant Fixlets found." }

Optional: limit baseline size (can cause performance issues)

$items = $items | Select-Object -First $MaxComponents

--- 3) Build baseline XML (components from source) ---

$components = $items | ForEach-Object {
' ' -f $SiteUrl, $.FixletID, $.ActionName
} | Out-String

$baselineXml = @"

$BaselineTitle
Auto-generated baseline of currently relevant Fixlets from Patches for Windows.
true

      <!-- ===== PRE (single hardcoded fixlet) ===== -->
<BaselineComponentGroup>


<!-- Fill these three placeholders -->
<BaselineComponent IncludeInRelevance="true"
                           SourceSiteURL="http://bigfix.ad.cdirad.com:52311/cgi-bin/bfgather.exe/CustomSite_Rayus_SQL_Patches"
                           SourceID="8321"
                           ActionName="Action1" />
</BaselineComponentGroup>

      <!-- ===== Dynamic components (your current list) ===== -->
<BaselineComponentGroup>


$components
</BaselineComponentGroup>

      <!-- ===== POST (single hardcoded fixlet) ===== -->
<BaselineComponentGroup>


<!-- Fill these three placeholders -->
<BaselineComponent IncludeInRelevance="true"
                           SourceSiteURL="http://bigfix.ad.cdirad.com:52311/cgi-bin/bfgather.exe/CustomSite_Rayus_SQL_Patches"
                           SourceID="8323"
                           ActionName="Action1" />
</BaselineComponentGroup>

</BaselineComponentCollection>
"@

--- 4) POST baseline into your custom site ---

$createUri = "$RootServer/api/baselines/custom/$CustomSiteName"
$response = Invoke-RestMethod -Uri $createUri -Headers $headers -Method POST -ContentType "text/xml" -Body $baselineXml -SkipCertificateCheck

Extract new baseline ID for convenience

[xml]$respXml = $response
$NewBaselineID = ($respXml.BESAPI.'Baseline' | Select-Object -First 1).'ID'
Write-Host "Created baseline '$BaselineTitle' in site '$CustomSiteName' with ID: $NewBaselineID"

--- 5) Build & POST a scheduled Action for the new baseline ---

Convert schedule to UTC and compute end time

$startUtc = $ScheduleStartLocal.ToUniversalTime()
$endUtc = $startUtc + $RunDuration

$startUtcStr = $startUtc.ToString("yyyy-MM-ddTHH:mm:ssZ")
$endUtcStr = $endUtc.ToString("yyyy-MM-ddTHH:mm:ssZ")

#All Computers relevance removed from line 166ish

true

<!-- Or:

-->

$actionXml = @"

$CustomSiteName
$NewBaselineID
Action1

  <CustomRelevance>member of group 1443 of site "ActionSite"</CustomRelevance>
  
</Target>

<Settings>
  <PreActionShowUI>false</PreActionShowUI>

  <HasStartTime>true</HasStartTime>
  <StartDateTimeLocal>$($ScheduleStartLocal.ToString("yyyy-MM-ddTHH:mm:ss"))</StartDateTimeLocal>

  <HasEndTime>true</HasEndTime>
  <EndDateTimeLocal>$($ScheduleStartLocal.Add($RunDuration).ToString("yyyy-MM-ddTHH:mm:ss"))</EndDateTimeLocal>

  <HasDayOfWeekConstraint>false</HasDayOfWeekConstraint>
  <UseUTCTime>false</UseUTCTime>

  <Reapply>false</Reapply>
  <ContinueOnErrors>true</ContinueOnErrors>

  <PostActionBehavior Behavior="Restart">
    <AllowCancel>false</AllowCancel>
    <PostActionDeadlineBehavior>RunAutomatically</PostActionDeadlineBehavior>
    <PostActionDeadlineInterval>PT1M</PostActionDeadlineInterval>
  </PostActionBehavior>

  <IsOffer>false</IsOffer>
</Settings>
"@

replace with and remove lines after and before to not reboot

$actionUri = "$RootServer/api/actions"
$actionResp = Invoke-RestMethod -Uri $actionUri -Headers $headers -Method POST -ContentType "text/xml" -Body $actionXml -SkipCertificateCheck
[xml]$actionXmlResp = $actionResp
$ActionID = ($actionXmlResp.BESAPI.Action | Select-Object -First 1).ID
Write-Host "Created scheduled action $ActionID. Starts (UTC): $startUtcStr Ends (UTC): $endUtcStr"

There's a lot going on there...which step is the one that's failing?

Have you considered a Server Automation plan as an alternative to control the sequence of failovers ?
Something like this… now the “Cumulative Update” steps 109 and 110 are just dummy actions for demo and would be substituted with your Baseline.

1 Like

It's not failing the db back at the end to the primary node.

is there a different way you would recommend to do this? our current SOP is for these clusters - all of them are three node, a DR, secondary and primary- where after each wave of patches (OS plus SQL) the DBAs validate that everything is good before proceeding to the next, at least for production clusters. Test/Dev we allow to just run, one wave per hour during the patch window. the more "dummy proof" the better if it can be managed. there's also the desire for as much of this to be schedulable.. that was the driver behind going with a script vs just using baselines.

If there are manual steps involved, one possible option is to use BigFix properties to constrain the action.

As an example, you could apply 1 baseline action to all 3 servers in the cluster, but require a specific property to evaluate to “TRUE” to run.

The first server patching would have this set to TRUE, then after DBA checks out, they would then mark the second server to TRUE, the action would then start/complete. DBA team checks out serer #2, and then repeat again for the third server.

Value of TRUE could be triggered on many options. Sky is the limit. File exists, reg key exists, some value/data in the file/regkey, etc. Lots of options. Possibly even combine it with a time stamp so it would ensure the property would not default to TRUE the next month, etc.

this is interesting - I've posed the question to my teammates. if I'm reading you right, after the baseline is invoked, the only human interaction after that would be for the on deck DBA to change the required value to TRUE, and the rest would happen automagically?

Human change would be easiest of course….

It’s highly probable there would be a way to automate rotating through the servers, so that only 1 can run at a time.

As noted, you just need a way for an action to look for a property and have it return true/false in the correct order. A possible combination of time stamps, regkeys, file/file contents are the most common examples. I say true/false as it’s the easiest, but anything is possible.

Note: This is essentially what happens with Server Automation if you have that license and have set it up. We have it, I have looked at it, but have never used it in production (yet) just due to our roles/responsibilities and time/prioritization.