9.5.7 DSA replication issue - OpenSSLError FlushBIOFailed

Just upgraded my DSA servers to 9.5.7, and initial replication seemed to be taking a long time. After spending about 4 hours watching the BESAdminTool replication status looping through three sites (‘Completed NextSites…Completed FixletResults…Completed PropertyResults’) and not finishing, I enabled FillDB debugging.

In the FillDB.log at debug level, I see repeated entries of
UploadMonitor: OpenSSLError while replicating file [a custom file under Uploads]: class FlushBIOFailed

I’ve never seen this one before. The file is a ZIP archive for a custom FireFox extension we use internally. The file was originally placed in the Uploads folder on August 10, 2017, and from there had replicated among all the DSA servers. We were on 9.5.6 at the time.

I removed this file from the Uploads folder, restarted FillDB, and now it looks like it may be progressing.

(we really need Error logging turned on by default for FillDB, and reported in the Admin Tool).

I’m about to leave for the holiday, but I’ll try to open a PMR next week. Any suggestions here in the forum? (I can read the forum while traveling, but won’t be able to connect to my servers to support PMR data collection).

2 Likes

Well replication still doesn’t seem to be completing, still keeps looping between the same table states and also to 'Replication was interrupted to process local server database insertions". No errors I can find in FillDB.log, but it seems to be looping over the same queries. I’m not seeing updated client status for actions on the DSA server.

Is there a way to manually trigger DSA replication? To try and troubleshoot this, I’m setting my replication interval to a long value (1 hour), and UninterruptableReplicationSeconds to 720. But I don’t want to wait an hour between replication attempts while I’m sitting here and watching the logs.

1 Like

Has anyone else with DSA configuration upgraded to 9.5.7 yet? How were your experiences?

I’m still not successfully replicating and will PMR it when I get back to town Sunday.

I have replication working in one direction now at least. Turns out
UnInterruptibleReplicationSeconds
Is NOT the same as
UnInterruptableReplicationSeconds

I don’t have a network path to the other server from travel, but expect that increasing the right value may fix it.

I found no indications of this in the FillDB debug logs, but noticed after enabling FillDB Performance logging that replication was still being interrupted after exactly 30 seconds while I had set the (wrong) value to a longer duration.

With DSA replication getting interrupted, I think it would never complete, as in there may be no such thing as a ‘partial’ replication that would make a little more progress each cycle. Is that the case?