Are there practical limits on the downloads\sha1 size on the BES Root Server (version 9.5.4)? I’m in an airgapped DSA environment and have had repeated failures (every few days) on the FillDB service. The service appears to be running, but FillDBData\BufferDir is not getting processed and client reports are not updating when this happens.
Currently my downloads\sha1 folder has just over 54,000 files for about 1.5 TB of storage. We’re also using the wwwrootbes/Uploads folder pretty heavily, for both custom content and files related to BigFix OS Deployment (images, drivers, etc.)
Watching in Process Monitor, BESRootServer is spending most of its time in the DownloadCache.db and DownloadState.db files.
I understand there are some improvements, particularly to FillDB, in 9.5.5 but I’m planning to watch-and-wait for another week or two before upgrading. If there are practical limits and I’m exceeding them, I’d also like to know whether/how those limits change in 9.5.5 …
A <-> B <-> C
A and C have no link to each other, they both replicate through B.
A <-> B is dual 1-Gb links, < 1 ms latency, testing at about 700 Mbps via iPerf.
B <-> C is a 600 Mb, VPN’d WAN link, 28ms latency, testing about 450 Mbps via iPerf.
We’ve been in this configuration for about four years now (IBM Pro Services helped with the initial setup), but our download cache sizes have steadily increased. As “B” and “C” cannot do Internet downloads, they have to precache everything; and the DownloadCachers care not for “Relevant” or “Not Relevant”, so everything that can be downloaded, has been, and remains in our cache for eternity.
In this architecture, which of the FillDB instances is/are exhibiting the issue?
On a somewhat related note, the download cacher can take as an option a file containing the desired download URLs (see the -o command-line argument). Such a file can be generated based on a number of conditions, including applicability in your environment. This might be one way to cut down on the download, storage, and subsequent replication requirements. Here’s some sample session relevance to help generate such a file:
("site=null fixletid=null actionid=null url=" & it & " type=download size=null sha1=null sha256=null") of (unique values of ((matches (case insensitive regex "((mailto\:|(news|(ht|f)tp(s?))\:\/\/){1}\S+)") of matches (case insensitive regex "^(download|prefetch|download now|download now as|add prefetch item).*$") of scripts of actions whose (exists script of it) of it) as string) of fixlets whose (applicable computer count of it > 0) of bes sites whose (true))
We more frequently have problems on “B”, the hub in this hub-n-spoke, but occasionally also have problems on “A”, which has more clients reporting and where we take most of the actions. “C” is in the remote site, hasn’t had any of these hangups, but also has no clients reporting to it during normal circumstance. It’s our disaster recovery site, configured to replicate DSA only every two hours or so, and when we activate “C” would still only have a hundred or so clients reporting to it.
Thanks for the session relevance, I’ll be giving that a try in the next couple of days. Using that we might be able to reduce our sha1 size considerably! I wasn’t aware you could use all of the null values for parameters.
We’ll still need to process the RHSMDownloadCacher separately as it builds a full repo, but at least the repo normally lives outside of sha1 so it shouldn’t consume as much processing to handle sha1 cache cleanup.
Thanks again for this relevance query, it’s the starting point of something I’m building into Web Reports to replace my use of the BESDownloadCacher. I do want to point out an improvement in your regular expression though.
regex "^(download|prefetch|download now|download now as|add prefetch item).*$" assumes that the download command is the first character on the line. There can be whitespace in front of the download command, which doesn’t seem to match. I’ve modified your regex to regex "^[[:space:]]*(download|prefetch|download now|download now as|add prefetch item).*$"
which accounts for the possibility of whitespace.
In my environment, after I filtered out the “MANUAL_BES_CACHING_REQUIRED” results, your regex gave me 1,105 results, while my modified form gives 1,370.
So I’m trying to resurrect some discussion on this.
On a server with a 1.6 TB sha1 cache, I ran a procmon for several minutes to watch the BESRootServer.exe process. I could only run a couple of minutes at a time before running out of memory, but during a 2 1/2 minute (clock time) capture, what I’d find repeatedly is
BESRootServer.exe spent 13.1 seconds in file i/o operations
7.9 seconds of that is spent in BES Server\Mirror Server\Config\DownloadState.db , DownloadCache.db-journal, or DownloadCache.db
I’m using the session relevance from earlier in this thread to abandon using BESDownloadCacher, and instead use relevance to generate a download list of only Relevant fixlet patches. This should let me shrink the sha1 cache size.
So, I’d like to know whether I can expect some performance payback on this. Do the file i/o operations on the DownloadState.db, DownloadCache.db-journal, and DownloadCache.db database scale with the number of files in sha1, or do they scale with the number of downloads requested by clients, or something else?
Once I’ve shrunk the sha1 cache size, is there any benefit to resetting these database files, and if so, how (stop the root server, delete the files, restart the root server, will they be recreated) ?