DSA Replication Failures

Hello Bigfix experts. I’ve got an issue with DSA (Distributed Server Architecture) and Replication between two master servers. The story is that I had replication working between two servers in the past, no problems. I then decommissioned one of the servers and created a new one in another site and tried to re-setup replication.

As it stands right now replication still isnt working between my two servers. I think the FillDB.log from both servers kind of tells the story as being a SQL-related error:

Server1 FillDB.log:

Thu, 04 Jan 2018 16:01:54 -0500 – 1452 – Unexpected exception: Database Error: [Microsoft][SQL Server Native Client 11.0]Unspecified error occurred on SQL Server. Connection may have been terminated by the server. (S1000: 0)
Thu, 04 Jan 2018 16:02:16 -0500 – 1456 – FillDB version 9.5.5.193 starting…
Thu, 04 Jan 2018 16:02:22 -0500 – 1456 – OpenSSL Initialized (Non-FIPS Mode)
Thu, 04 Jan 2018 16:02:22 -0500 – 1456 – Using OpenSSL crypto library libBEScrypto64 - OpenSSL 1.0.2j 26 Sep 2016
Thu, 04 Jan 2018 16:02:22 -0500 – 1456 – Signature Algorithms: sha256
Thu, 04 Jan 2018 16:02:22 -0500 – 1456 – Download Algorithms: sha256
Thu, 04 Jan 2018 16:02:22 -0500 – 1456 – Signature Algorithms: sha256
Thu, 04 Jan 2018 16:02:22 -0500 – 1456 – Download Algorithms: sha256
Sun, 07 Jan 2018 03:40:27 -0500 – 1456 – Replication failed for server ‘SERVER2.ourdomain.X’: A replication lock request for ADMINFIELDS (Exclusive) timed out.
Sun, 07 Jan 2018 21:12:17 -0500 – 1456 – Replication failed for server ‘SERVER2.ourdomain.X’: A replication lock request for QUESTIONRESULTS (Exclusive) timed out.
Tue, 09 Jan 2018 10:43:20 -0500 – 3780 – Replication connection attempt failed for server ‘SERVER2.ourdomain.X’: Database Error: [Microsoft][ODBC SQL Server Driver][DBNETLIB]SQL Server does not exist or access denied. (08001: 17)
[Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionOpen (Connect()). (01000: 53)
Tue, 09 Jan 2018 10:49:14 -0500 – 1848 – Replication connection attempt failed for server ‘SERVER2.ourdomain.X’: Database Error: [Microsoft][ODBC SQL Server Driver][DBNETLIB]SQL Server does not exist or access denied. (08001: 17)
[Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionOpen (Connect()). (01000: 53)
Thu, 11 Jan 2018 15:00:19 -0500 – 1456 – Replication failed for server ‘SERVER2.ourdomain.X’: A replication lock request for QUESTIONRESULTS (Exclusive) timed out.
Sat, 13 Jan 2018 05:38:00 -0500 – 1456 – Replication failed for server ‘SERVER2.ourdomain.X’: A replication lock request for QUESTIONRESULTS (Exclusive) timed out.
Sat, 13 Jan 2018 18:05:07 -0500 – 1456 – Replication failed for server ‘SERVER2.ourdomain.X’: A replication lock request for QUESTIONRESULTS (Exclusive) timed out.
Mon, 15 Jan 2018 08:49:33 -0500 – 1456 – Replication failed for server ‘SERVER2.ourdomain.X’: A replication lock request for QUESTIONRESULTS (Exclusive) timed out.

Server2 FillDB.log:

Thu, 04 Jan 2018 16:01:51 -0500 – 4800 – Replication failed for server ‘SERVER1.ourdomain.X’: Database Error: [Microsoft][SQL Server Native Client 11.0]Shared Memory Provider: No process is on the other end of the pipe.
(08S01: 233)
[Microsoft][SQL Server Native Client 11.0]Communication link failure (08S01: 233)
Thu, 04 Jan 2018 16:01:51 -0500 – 4800 – Unable to connect to database: Database Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure (08S01: 0)
Thu, 04 Jan 2018 16:02:02 -0500 – 1340 – FillDB version 9.5.5.193 starting…
Thu, 04 Jan 2018 16:02:07 -0500 – 1340 – OpenSSL Initialized (Non-FIPS Mode)
Thu, 04 Jan 2018 16:02:07 -0500 – 1340 – Using OpenSSL crypto library libBEScrypto64 - OpenSSL 1.0.2j 26 Sep 2016
Thu, 04 Jan 2018 16:02:07 -0500 – 1340 – Signature Algorithms: sha256
Thu, 04 Jan 2018 16:02:07 -0500 – 1340 – Download Algorithms: sha256
Thu, 04 Jan 2018 16:02:07 -0500 – 1340 – Signature Algorithms: sha256
Thu, 04 Jan 2018 16:02:07 -0500 – 1340 – Download Algorithms: sha256
Thu, 04 Jan 2018 16:02:30 -0500 – 1692 – Replication connection attempt failed for server ‘SERVER1.ourdomain.X’: Database Error: [Microsoft][ODBC SQL Server Driver]Login timeout expired (S1T00: 0)
Tue, 09 Jan 2018 10:51:08 -0500 – 936 – Replication connection attempt failed for server ‘SERVER1.ourdomain.X’: Database Error: [Microsoft][ODBC SQL Server Driver][DBNETLIB]SQL Server does not exist or access denied. (08001: 17)
[Microsoft][ODBC SQL Server Driver][DBNETLIB]ConnectionOpen (Connect()). (01000: 53)


I’m sure I’ve got the same version of SQL / patch level between the two servers. The two servers should be setup roughly the same with permissions and everything. Do these errors jump out to anyone as anything obvious I could be checking?

Thanks tremendously for any hints anyone might be able to provide!

1 Like

Check the REPLICATION_SERVERS table in the database BFEnterprise database. Ensure that the resolvable host names in the URL field are specified correctly.

Related (potential problem and fix if you are using named instance databases):
http://www-01.ibm.com/support/docview.wss?uid=swg21974540

Hello and thank you very much for your reply!

I have checked the REPLICATION_SERVERS table in the past… and validated that the host names / URL fields are exactly correct.

We are not using “named instances”, so, the advice there doesn’t apply (I’ve even tried it and it doesn’t work).

For what it is worth I cleared log and rebooted both servers again today. Here are the FillDB logs again:

Server 1:
Thu, 18 Jan 2018 08:28:00 -0500 – 1392 – FillDB Stop Requested.
Thu, 18 Jan 2018 08:28:02 -0500 – 1456 – Replication failed for server ‘SERVER1.ourdomain.X’: Database Error: [Microsoft][SQL Server Native Client 11.0]TCP Provider: The specified network name is no longer available.
(08S01: 64)
[Microsoft][SQL Server Native Client 11.0]Communication link failure (08S01: 64)
Thu, 18 Jan 2018 08:28:03 -0500 – 1456 – Unable to connect to database: Database Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure (08S01: 0)
Thu, 18 Jan 2018 08:28:23 -0500 – 1464 – FillDB version 9.5.5.193 starting…
Thu, 18 Jan 2018 08:28:28 -0500 – 1464 – OpenSSL Initialized (Non-FIPS Mode)
Thu, 18 Jan 2018 08:28:28 -0500 – 1464 – Using OpenSSL crypto library libBEScrypto64 - OpenSSL 1.0.2j 26 Sep 2016
Thu, 18 Jan 2018 08:28:28 -0500 – 1464 – Signature Algorithms: sha256
Thu, 18 Jan 2018 08:28:28 -0500 – 1464 – Download Algorithms: sha256
Thu, 18 Jan 2018 08:28:28 -0500 – 1464 – Signature Algorithms: sha256
Thu, 18 Jan 2018 08:28:28 -0500 – 1464 – Download Algorithms: sha256
Thu, 18 Jan 2018 08:28:29 -0500 – 1464 – Initial replication from P110-IEMMAST01.directcash.net using to log on BFEnterprise
Thu, 18 Jan 2018 08:28:37 -0500 – 1464 – Replication failed for server ‘SERVER2.ourdomain.X’: A replication lock request for QUESTIONRESULTS (Exclusive) timed out.

Server2:
Thu, 18 Jan 2018 08:27:55 -0500 – 1312 – FillDB Stop Requested.
Thu, 18 Jan 2018 08:28:00 -0500 – 1340 – Replication failed for server ‘SERVER1.ourdomain.X’: Database Error: [Microsoft][SQL Server Native Client 11.0]Unspecified error occurred on SQL Server. Connection may have been terminated by the server. (S1000: 0)
Thu, 18 Jan 2018 08:28:00 -0500 – 1340 – Unable to connect to database: Database Error: [Microsoft][SQL Server Native Client 11.0]TCP Provider: An existing connection was forcibly closed by the remote host.
(08S01: 10,054)
[Microsoft][SQL Server Native Client 11.0]Communication link failure (08S01: 10,054)
Thu, 18 Jan 2018 08:28:12 -0500 – 1344 – FillDB version 9.5.5.193 starting…
Thu, 18 Jan 2018 08:28:14 -0500 – 1344 – Unable to connect to database: Database Error: [Microsoft][SQL Server Native Client 11.0]Client unable to establish connection because an error was encountered during handshakes before login. Common causes include client attempting to connect to an unsupported version of SQL Server, server too busy to accept new connections or a resource limitation (memory or maximum allowed connections) on the server. (08001: 26)
[Microsoft][SQL Server Native Client 11.0]Client unable to establish connection (08001: 26)
[Microsoft][SQL Server Native Client 11.0]TCP Provider: The specified network name is no longer available.
(08001: 64)
[Microsoft][SQL Server Native Client 11.0]Client unable to establish connection due to prelogin failure (08001: 64)
Thu, 18 Jan 2018 08:28:24 -0500 – 1344 – OpenSSL Initialized (Non-FIPS Mode)
Thu, 18 Jan 2018 08:28:24 -0500 – 1344 – Using OpenSSL crypto library libBEScrypto64 - OpenSSL 1.0.2j 26 Sep 2016
Thu, 18 Jan 2018 08:28:24 -0500 – 1344 – Signature Algorithms: sha256
Thu, 18 Jan 2018 08:28:24 -0500 – 1344 – Download Algorithms: sha256
Thu, 18 Jan 2018 08:28:24 -0500 – 1344 – Signature Algorithms: sha256
Thu, 18 Jan 2018 08:28:24 -0500 – 1344 – Download Algorithms: sha256
Thu, 18 Jan 2018 08:28:28 -0500 – 1344 – Initial replication from SERVER1.ourdomain.X using to log on BFEnterprise

I’m so totally stumped by this issue. If anyone has any other suggestions I’d be happy to try them. :smiley: Thanks

First ensure you have good SQL connectivity by using SQL Server Management Studio on each server, to connect to the other server’s SQL instance.

Hi, Jason! Thank you very much for your reply.

From both servers, I can load SSMS and successfully connect to the ‘other’ server (using the same service account that our Bigfix services run under).

I’ve also used a Microsoft utility (Keberos Configuration Manager for SQL Server) to verify the SPNs on both servers are valid.

The ODBC admin tools (both 32 & 64-bit), on both servers, perform successful connection tests as well.

Check to ensure traffic on port 1433 is allowed both inbound and outbound between each replication server.

Check to ensure you have not set the maximum memory too low for either of your SQL Server instances (https://technet.microsoft.com/en-us/library/ms191144(v=sql.105).aspx)

Consider upgrading the BigFix Server components on the replication servers from version 9.5.5 to version 9.5.8 to avoid this bug:
Issue 154276 - APAR IJ00274 - FILLDB PERFORMANCE DEGRADATION, DUE TO LONG TIME REPORT PARSING
Ref: http://support.bigfix.com/bes/changes/fullchangelist-95.txt

Thanks again for your reply, Jason. I really appreciate it!

  • I increased the maximum memory from 2500MB to 3000MB from both servers.
  • I’ve got patching running this week, so I’ll attempt to upgrade the environment next week to see if that makes an improvement.

Okay, I finished upgrading both servers to 9.5.8.38. After this, I still do not see an improvement in the replication status. Similar messages appear in the FillDB logs, which I am including below in case anyone else has any other suggestions:

SERVER1:

Mon, 29 Jan 2018 09:22:05 -0500 – 1232 – FillDB Stop Requested.
Mon, 29 Jan 2018 09:22:06 -0500 – 1240 – Replication failed for server ‘SERVER2.ourdomain.X’: Database Error: [Microsoft][SQL Server Native Client 11.0]Shared Memory Provider: The pipe has been ended.
(08S01: 109)
[Microsoft][SQL Server Native Client 11.0]Communication link failure (08S01: 109)
Mon, 29 Jan 2018 09:22:06 -0500 – 1240 – Unable to connect to database: Database Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure (08S01: 0)
Mon, 29 Jan 2018 09:29:32 -0500 – 1328 – FillDB version 9.5.8.38 starting…
Mon, 29 Jan 2018 09:29:35 -0500 – 1328 – OpenSSL Initialized (Non-FIPS Mode)
Mon, 29 Jan 2018 09:29:35 -0500 – 1328 – Using OpenSSL crypto library libBEScrypto64 - OpenSSL 1.0.2j-fips 26 Sep 2016
Mon, 29 Jan 2018 09:29:35 -0500 – 1328 – Signature Algorithms: sha256
Mon, 29 Jan 2018 09:29:35 -0500 – 1328 – Download Algorithms: sha256
Mon, 29 Jan 2018 09:29:35 -0500 – 1328 – Signature Algorithms: sha256
Mon, 29 Jan 2018 09:29:35 -0500 – 1328 – Download Algorithms: sha256
Mon, 29 Jan 2018 09:29:51 -0500 – 2124 – Replication connection attempt failed for server ‘SERVER2.ourdomain.X’: Database Error: [Microsoft][SQL Server Native Client 11.0]Named Pipes Provider: Could not open a connection to SQL Server [53]. (08001: 53)
[Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (08001: 53)
[Microsoft][SQL Server Native Client 11.0]Login timeout expired (S1T00: 0)

SERVER2:

Mon, 29 Jan 2018 09:23:01 -0500 – 1364 – FillDB Stop Requested.
Mon, 29 Jan 2018 09:23:02 -0500 – 1380 – Replication failed for server ‘SERVER1.ourdomain.X’: Database Error: [Microsoft][SQL Server Native Client 11.0]Shared Memory Provider: No process is on the other end of the pipe.
(08S01: 233)
[Microsoft][SQL Server Native Client 11.0]Communication link failure (08S01: 233)
Mon, 29 Jan 2018 09:23:03 -0500 – 1380 – Unable to connect to database: Database Error: [Microsoft][SQL Server Native Client 11.0]Communication link failure (08S01: 0)
Mon, 29 Jan 2018 09:23:17 -0500 – 1368 – FillDB version 9.5.8.38 starting…
Mon, 29 Jan 2018 09:23:24 -0500 – 1368 – OpenSSL Initialized (Non-FIPS Mode)
Mon, 29 Jan 2018 09:23:24 -0500 – 1368 – Using OpenSSL crypto library libBEScrypto64 - OpenSSL 1.0.2j-fips 26 Sep 2016
Mon, 29 Jan 2018 09:23:24 -0500 – 1368 – Signature Algorithms: sha256
Mon, 29 Jan 2018 09:23:24 -0500 – 1368 – Download Algorithms: sha256
Mon, 29 Jan 2018 09:23:24 -0500 – 1368 – Signature Algorithms: sha256
Mon, 29 Jan 2018 09:23:24 -0500 – 1368 – Download Algorithms: sha256
Mon, 29 Jan 2018 09:24:10 -0500 – 836 – Replication connection attempt failed for server ‘SERVER1.ourdomain.X’: Database Error: [Microsoft][SQL Server Native Client 11.0]Named Pipes Provider: Could not open a connection to SQL Server [53]. (08001: 53)
[Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (08001: 53)
[Microsoft][SQL Server Native Client 11.0]Login timeout expired (S1T00: 0)
Mon, 29 Jan 2018 09:30:01 -0500 – 2416 – Replication connection attempt failed for server ‘SERVER1.ourdomain.X’: Database Error: [Microsoft][SQL Server Native Client 11.0]Named Pipes Provider: Could not open a connection to SQL Server [53]. (08001: 53)
[Microsoft][SQL Server Native Client 11.0]A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online. (08001: 53)
[Microsoft][SQL Server Native Client 11.0]Login timeout expired (S1T00: 0)

The “issue” above is now resolved (after running a Bigfix support ticket), so I thought I would share a few observations for anyone who finds themselves in the same situation I was in. :slight_smile:

  1. Even if you see the messages “database not initialized” or “actionsites not created” in the Admin tool, it doesn’t necessarily mean replication isn’t working. I’m pretty sure it was working (at least partially) because I could create content on one server and see it on the other. However, obviously, ideally, you want “replication successful” on both servers. So just a small point to make there.

  2. The SQL related errors above, are also kind of a red herring, because you can get all those weird messages just by rebooting the server. You can get those messages when the app/DB are not fully up. So although I was focusing on those errors (hence this post), they were ultimately not a real issue.

  3. What was I doing wrong then? Per IBM Support, there were two things I did wrong across my multiple reinstallation attempts. Firstly, when you’re providing a masthead file for the installer, do not use the original masthead. I started with Bigfix in 2014 so my masthead file is from 2014. Instead, I was told to copy and use the ActionSite (masthead) file with a current-ish datestamp from my installed master server.

  4. Every time I got fed up with my replication server install, I was using the BESRemove tool to clean up before reinstalling. That is great and all, but, checking the registry we were seeing a bunch of sites missing from underneath the \Enterprise Client key (like _WWW ones). So instead of cleaning up and reinstalling, IBM Support advised me to run the installer “overtop” of the existing replication server install. This seemed to actually fix it. When doing new installs after using BESRemove, we were seeing in the admin tool “all the tabs”. Really, a correctly installed replication server should have a lesser number of tabs appear in Admin tool.

  5. I had another issue with the crypto keys. (In \BES Server directory you can see 5 files that are like Encrypted****Key. I had to recreate these files, then I did the process to decrypt them, then copy them to the replication server, and finally reencrypt them on the replication server.

So in conclusion, the main thing that (oddly enough) seemed to resolve the issues I was facing above, even though I had done BESRemove/Install multiple times, was to actually perform an installation of BES Server overtop the existing, installed, DSA/replication server instance… :slight_smile:

Have a great day!

1 Like