spamassassin-sysadmins mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fossies Administrator <Jens.Schleuse...@fossies.org>
Subject Re: Some interesting (?) observations on a mirror server (sa-update.fossies.org)
Date Fri, 21 Sep 2018 13:26:43 GMT
Hi,

> incidentally I looked some weeks ago on the web server access log file of the 
> SpamAssassin rules update files mirror sa-update.fossies.org and found 
> surprisingly that at noon (midday) the log file has a size much more than the 
> roughly expected half of a complete daily log.
>
> Just for curiosity I plotted the number of the GET requests for update files 
> (tarballs) per hour and saw an interesting characteristics with a great peak 
> between 6 and 7 a.m. (GMT+2). Ok, the main reason is probably the publication 
> time (mostly between 5 and 6 a.m. GMT+2) with a delay til the user's 
> sa-update scripts are running. But the structure of the curves with the some 
> curious (?) mimima is a little bit "surprisingly" to me but it is constant 
> and reproducible.
>
> A simple example text plot for a single day is attached (more accurate plots 
> are available under the URL given below).
>
> But more interesting and "irritating" was the fact that I found in the main 
> update time often (at least 100-1000) entries with the HTTP status 404 ("Not 
> Found"). That motivated me to write a primitive script to analyze the reason 
> by monitoring the update status resp. update times of the new published rules 
> update files.
>
> First I checked the local web log files assuming that a 404 request to an 
> update file means that an external client had the information about a new 
> file that the local mirror sa-update.fossies.org has not yet available resp. 
> not yet fetched (via rsync).
>
> Additionally I checked the local DNS server (of the server provider) and the 
> DNS servers I found responsible for the domain spamassassin.org
>
> ns2.pccc.com.
> ns2.ena.com.
> c.auth-ns.sonic.net.
> b.auth-ns.sonic.net.
> a.auth-ns.sonic.net.
>
> via the command
>
> dig @<server> 3.3.3.updates.spamassassin.org txt +short
>
> The plots and an extract of the script output you can find under
>
> https://fossies.org/~schleusener/sa-update.mirror_analysis/
>  User: sa
>  PW: update
>
> The main reason for the 404 errors seems to be that the mirroring script is 
> started as cronjob on sa-update.fossies.org only every 10 minutes.
>
> Probably better would be to check the original nameservers (the local 
> nameserver answers according the TTL only with a freshness delay of max. one 
> hour) and start only a rsync job if the response shows that a new file is 
> available.
>
> If all mirror servers would use update frequencies not smaller than 10 
> minutes an idea may be also to set/change the DNS TXT entry only 10 minutes 
> after the release (availability) of a new update file.
>
> Additionally I found that the synchronization of the above DNS servers seems 
> delayed by some minutes. The "best" DNS server seems to be "ns2.ena.com" 
> since it always as first one provides the new versions.
>
> Maybe this behaviour is a little bit related to the current thread with the 
> subject "repeated sa-update problems" on the users list.

Looking at the offered data again I found it difficult to read so I 
compressed them again and added it also to this mail as text attachment.

Another "problem" I found is that some clients downloaded the identical 
update tarball several times a day (the top IP roughly 300 times). Ok, 
that is meaningless (a HTTP HEAD or a DNS request would be sufficient)
but it may be bearable.

Regards

Jens

-- 
FOSSIES - The Fresh Open Source Software archive
mainly for Internet, Engineering and Science
https://fossies.org/
Mime
View raw message