spamassassin-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Spamassassin Wiki] Update of "OcrPlugin" by MaartenDeBoer
Date Tue, 28 Mar 2006 08:08:07 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Spamassassin Wiki" for change notification.

The following page has been changed by MaartenDeBoer:
http://wiki.apache.org/spamassassin/OcrPlugin

------------------------------------------------------------------------------
  
  You will need {{{giftopnm}}} and {{{gocr}}} installed.
  
+ == Installation ==
+ 
+ Save the two files below in your local configuration directory, adjusting the score in {{{Ocr.cf}}}
as you like, and the
+ wordlist ({{{my @words =}}}) in {{{Ocr.pm}}} according to the spam you are receiving. You
might want to run {{{gocr}}}
+ by hand on the image attachments to look for words that are correctly recognized.
+ 
  == Remarks ==
  
   * Note that this is my first SA plugin, so any feedback is welcome
-  * The words checked for are specific for some spam I received a lot of recently. 
+  * The words checked for are specific for some spam I received a lot of recently.
+  * {{{gocr}}} can take up quite a bit of resources, so be careful. But it is only executed
for messages that contain gif attachments.
+ 
+ == ToDo ==
+ 
-  * TODO: Words are hardcoded. Should be a configuration parameter instead.
+  * Words are hardcoded. Should be a configuration parameter instead.
+  * Instead of checking for specific words, it might be better to "check if the image contains
a certain amount of text", since it is not very likely that people send legitimate mail with
text in images.
  
  -- Author: Maarten de Boer, mdeboer -at- iua -dot- upf -dot- edu
  

Mime
View raw message