spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Gallagher <>
Subject Re: Is fuzzyocr i.e. Image scanning
Date Wed, 17 Oct 2018 06:16:22 GMT
My comments on

IC is an effort to dig a hole in the water, because the problem of image spam with obfuscated
text cannot be solved by ocr.

My approach is a "better safe than sorry" best practice that anyone can implement with existing

1. do not display inline the content of attachments and linked resources;
2. give high spam score (>=5) to any email with very low text to image ratio.

On pdf and similar attachments, reject anything with built in macros or scripts.


On Tue, Oct 16, 2018 at 06:49, Olivier <> wrote:

> Brent,
> I have Fuzzy OCR installed and running, but the only rule that was
> trigered 22 times during the past 40 days was FUZZY_OCR_WRONG_CTYPE,
> meaning that the image type does not match the content-type set for
> That is still a valid catch, but not based on the OCR'ed text.
> One of my holdback with FuzzyOCR is that you have to provide an
> independant word list, while we have a very good tool to analyze text
> contents: SpamAssassin itself. So I would much prefer FuzzyOCR to feed
> the OCR'ed text back to SA for further analysis (the way pdfAssassin is
> working). But then, we need a way to detect that the OCR process has
> worked, that some more or less valid text, in a valid language has been
> extracted.
> Another approach I like is the one of Image Cerberus (dig in
> which uses meta data of the image
> (size, histogram of colours, etc.) to classify the image as probable
> spam or probable ham and then implements Bayes classifier.
> As for your question about the place for image scanning, if your MTA has
> the resources to do so, why not? And if FuzzyOCR is not yet the ultimate
> OCR solution, it is still improving, so why give-up a tool that can
> help?
> Regards,
> Olivier
> --
View raw message