spamassassin-sysadmins mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Merijn van den Kroonenberg" <mer...@web2all.nl>
Subject Re: Eureka: truncation of 72_active.cf
Date Sat, 04 Nov 2017 21:04:56 GMT
> Merijn,
>
> I patched the generate-new-scores.sh locally on sa-vm1 using your patch
> file with a slight adjustment.  I changed the copied file name to
> "72_active_before_grep.cf" just to make it a little more obvious.  We
> will see how it looks tomorrow in the tmp working area on sa-vm1 and I
> will reply with the results.

Nice, might be good to have this extra debugging info available for now.
This request was made before I knew about the language lines, so hopefully
won't really need this now.

>
> I am not seeing how the 72_active.cf file is generated before line 200
> of generate-new-scores.sh.  I was going to go back a few days before
> Kevin's commit to remove the other languages from MILLION_USD to perform
> the same grep that you did.  I wanted to do this on the sa-vm1 server to
> figure out how to fix the grep properly so Kevin can put back those
> other languages.

Check bugzilla bug 7497 I put in an instruction on how to reproduce/test.
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7497

>
> Dave
>
>
>
> From: Merijn van den Kroonenberg <merijn@web2all.nl>
> Sent: Wednesday, November 1, 2017 8:52 AM
> To: David Jones
> Subject: Eureka: truncation of 72_active.cf
>
> Hi David,
>
> After backtracking the scripts while verifying generated files I finally
> came very close to the cause of the truncation issue.
>
> The reason why the scoreset files and 72_scores.cf is truncated after
> MILLION_USD is because the rules/72_active.cf is truncated.
>
> This file is generated by build/mkrules (triggered from Makefile.pl)
>
> It ends like this:
>
> body MILLION_USD                /Million\b.{0,40}\b(?:United States?
> Dollars?|USD)/i
> describe MILLION_USD            Talks about millions of dollars
> #score MILLION_USD 2
> Binary file rules/72_active.cf matches
>
> As you can see, this last line is a typical grep message. So it was not
> hard to track it to the script causing this:
>
> ./masses/rule-update-score-gen/generate-new-scores.sh:202:grep -v ^score
> rules/72_active.cf > rules/72_active.cf-scoreless
> ./masses/rule-update-score-gen/generate-new-scores.sh:203:mv -f
> rules/72_active.cf-scoreless rules/72_active.cf
>
> My theory is that grep encounters too many non-text characters in
> rules/72_active.cf so its deciding its a binary file after all and stops
> grepping the rest of the file.
>
> As you can see in ./masses/rule-update-score-gen/generate-new-scores.sh
> the original 72_active.cf is overwritten so I cannot see what is
> actually in there that causes grep to panic.
>
> I think if we patch scripts to make a copy right after mkrules runs, we
> will be able to see or test why grep chokes.
>
> I attached a patch file with proposal for debugging.
>
> We are very close to the problem now I think.
>
> Met vriendelijke groet,
>
> Merijn van den Kroonenberg
>
> Web2All B.V.
> Gulickstraat 17
> 5931 LA Tegelen
> Tel. +31 475 775511
> Fax. +31 475 338290
>
> merijn@web2all.nl | www.web2all.nl
>
>



Mime
View raw message