spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From polloxx <poll...@gmail.com>
Subject Re: FuzzyOCR
Date Thu, 07 Jul 2011 12:09:22 GMT
On Wed, Jul 6, 2011 at 6:33 PM, John Hardin <jhardin@impsec.org> wrote:
> On Wed, 6 Jul 2011, polloxx wrote:
>
>> Works fine at the CL.
>
> OK. Just to be clear, you took a jpeg-format image file and used jpegtopnm
> to convert it to a pnm file, and got a correct .pnm image file out? Did you
> do this to verify the exit code from jpegtopnm:
>
>    echo $?
>

$ /usr/bin/jpegtopnm ./spam1.jpg > spam1.pnm
jpegtopnm: WRITING PPM FILE

spam1.pnm is created.



>> Nothing about the error. (It's a SA error I think)
>
> It looks to me like SA is just reporting an error code from jpegtopnm. I did
> some brief digging and couldn't find anything about that particular error
> code, it might take a look at the sources to learn what it means.
>
>> apt-get did not alter jpegtopnm.
>
> Bummer, so much for the easy explanation... :)
>
> Do you have a sample message having an image attachment that you can run
> through SA manually to test things? If not, try to get one.
>
> It would be useful to see the debugging output of spamassassin where it's
> talking about fuzzyocr. Do you know how to run spamassassin in debug mode
> against a test message?
>

# spamassassin --debug FuzzyOCR < ./spam1.jpg > /dev/null
Jul  7 13:46:19.889 [4590] dbg: FuzzyOcr: focr_bin_helper:
'pnmnorm,pnminvert,ppmtopgm'
Jul  7 13:46:19.889 [4590] info: FuzzyOcr: Adding <3> new helper apps
Jul  7 13:46:19.889 [4590] dbg: FuzzyOcr: focr_bin_helper: 'tesseract'
Jul  7 13:46:19.890 [4590] info: FuzzyOcr: Adding <1> new helper apps
Jul  7 13:46:19.891 [4590] info: FuzzyOcr: Starting preprocessor
parser for file "/etc/mail/spamassassin/FuzzyOcr.preps"...
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line: preprocessor normalize {
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line: command = pnmnorm
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line: }
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line: preprocessor invert {
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line: command = pnminvert
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line: }
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line: preprocessor ppmtopgm {
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line: command = ppmtopgm
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line: }
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line: preprocessor maketiff {
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line: command = pnmtotiff
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line: args = -color -truecolor
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line: }
Jul  7 13:46:19.891 [4590] info: FuzzyOcr: Starting scanset parser for
file "/etc/mail/spamassassin/FuzzyOcr.scansets"...
Jul  7 13:46:19.891 [4590] dbg: FuzzyOcr: line scanset ocrad {
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line command = $ocrad
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line args = -s5 $input
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line }
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line scanset ocrad-invert {
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line command = $ocrad
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line args = -s5 -i $input
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line }
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line scanset ocrad-decolorize-invert {
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line preprocessors = ppmtopgm
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line command = $ocrad
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line args = -s5 -i $input
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line }
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line scanset ocrad-decolorize {
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line preprocessors = ppmtopgm
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line command = $ocrad
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line args = -s5 $input
Jul  7 13:46:19.892 [4590] dbg: FuzzyOcr: line }
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line scanset gocr {
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line command = $gocr
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line args = -i $input
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line }
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line scanset gocr-180 {
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line command = $gocr
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line args = -l 180 -d 2 -i $input
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line }
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line scanset tesseract {
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line preprocessors = maketiff
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line command = $tesseract
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line args = $input $output
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line force_output_in = $output.txt
Jul  7 13:46:19.893 [4590] dbg: FuzzyOcr: line }
Jul  7 13:46:20.439 [4590] info: FuzzyOcr: Searching in: /usr/local/netpbm/bin
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Searching in: /usr/local/bin
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Searching in: /usr/bin
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using gifsicle => /usr/bin/gifsicle
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using giffix => /usr/bin/giffix
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using giftext => /usr/bin/giftext
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using gifinter => /usr/bin/gifinter
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using giftopnm => /usr/bin/giftopnm
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using jpegtopnm => /usr/bin/jpegtopnm
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using pngtopnm => /usr/bin/pngtopnm
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using bmptopnm => /usr/bin/bmptopnm
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using tifftopnm => /usr/bin/tifftopnm
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using ppmhist => /usr/bin/ppmhist
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using pamfile => /usr/bin/pamfile
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using ocrad => /usr/bin/ocrad
Jul  7 13:46:20.440 [4590] dbg: FuzzyOcr: Cannot find executable for gocr
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using pnmnorm => /usr/bin/pnmnorm
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using pnminvert => /usr/bin/pnminvert
Jul  7 13:46:20.440 [4590] info: FuzzyOcr: Using ppmtopgm => /usr/bin/ppmtopgm
Jul  7 13:46:20.440 [4590] dbg: FuzzyOcr: Cannot find executable for tesseract
Jul  7 13:46:20.441 [4590] dbg: FuzzyOcr: Threshold[max_hash] => 5
Jul  7 13:46:20.441 [4590] dbg: FuzzyOcr: Threshold[c] => 5
Jul  7 13:46:20.441 [4590] dbg: FuzzyOcr: Threshold[s] => 0.01
Jul  7 13:46:20.441 [4590] dbg: FuzzyOcr: Threshold[w] => 0.01
Jul  7 13:46:20.441 [4590] dbg: FuzzyOcr: Threshold[h] => 0.01
Jul  7 13:46:20.441 [4590] dbg: FuzzyOcr: Threshold[cn] => 0.01
Jul  7 13:46:20.441 [4590] dbg: FuzzyOcr: focr_add_score => 1
Jul  7 13:46:20.441 [4590] dbg: FuzzyOcr: focr_autodisable_negative_score => -5
Jul  7 13:46:20.441 [4590] dbg: FuzzyOcr: focr_autodisable_score => 1000
Jul  7 13:46:20.441 [4590] dbg: FuzzyOcr: focr_autosort_buffer => 10
Jul  7 13:46:20.441 [4590] dbg: FuzzyOcr: focr_autosort_scanset => 1
Jul  7 13:46:20.441 [4590] dbg: FuzzyOcr: focr_base_score => 5
Jul  7 13:46:20.441 [4590] dbg: FuzzyOcr: focr_corrupt_score => 2.5
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_corrupt_unfixable_score => 5
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_counts_required => 2
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_db_hash =>
/etc/mail/spamassassin/FuzzyOcr.db
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_db_max_days => 35
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_db_safe =>
/etc/mail/spamassassin/FuzzyOcr.safe.db
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_digest_db =>
/etc/mail/spamassassin/FuzzyOcr.hashdb
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_enable_image_hashing => 0
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_global_timeout => 0
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_global_wordlist =>
/etc/mail/spamassassin/FuzzyOcr.words
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_hashing_learn_scanned => 1
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_keep_bad_images => 0
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_log_pmsinfo => 1
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_log_stderr => 1
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_max_height => 800
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_max_width => 800
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_min_height => 4
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_min_width => 4
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_minimal_scanset => 1
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_mysql_db => FuzzyOcr
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_mysql_hash => Hash
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_mysql_host => localhost
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_mysql_port => 3306
Jul  7 13:46:20.442 [4590] dbg: FuzzyOcr: focr_mysql_safe => Safe
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_mysql_update_hash => 0
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_mysql_user => fuzzyocr
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_no_homedirs => 0
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_path_bin =>
/usr/local/netpbm/bin:/usr/local/bin:/usr/bin
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_pdf_maxpages => 1
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_personal_wordlist =>
__userstate__/FuzzyOcr.words
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_preprocessor_file =>
/etc/mail/spamassassin/FuzzyOcr.preps
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_scan_pdfs => 0
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_scanset_file =>
/etc/mail/spamassassin/FuzzyOcr.scansets
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_score_ham => 0
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_skip_bmp => 0
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_skip_gif => 0
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_skip_jpeg => 0
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_skip_png => 0
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_skip_tiff => 0
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_skip_updates => 0
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_strip_numbers => 1
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_threshold => 0.25
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_timeout => 10
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_twopass_scoring_factor => 1.5
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_unique_matches => 0
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_verbose => 1
Jul  7 13:46:20.443 [4590] dbg: FuzzyOcr: focr_wrongctype_score => 1.5
Jul  7 13:46:20.444 [4590] dbg: FuzzyOcr: focr_wrongext_score => 1.5
Jul  7 13:46:20.444 [4590] info: FuzzyOcr: Loaded preprocessor
normalize: /usr/bin/pnmnorm
Jul  7 13:46:20.444 [4590] info: FuzzyOcr: Loaded preprocessor invert:
/usr/bin/pnminvert
Jul  7 13:46:20.444 [4590] info: FuzzyOcr: Loaded preprocessor
ppmtopgm: /usr/bin/ppmtopgm
Jul  7 13:46:20.444 [4590] info: FuzzyOcr: Loaded preprocessor
maketiff: pnmtotiff -color -truecolor
Jul  7 13:46:20.444 [4590] info: FuzzyOcr: Using scan ocrad:
/usr/bin/ocrad -s5 $input
Jul  7 13:46:20.444 [4590] info: FuzzyOcr: Using scan ocrad-invert:
/usr/bin/ocrad -s5 -i $input
Jul  7 13:46:20.444 [4590] info: FuzzyOcr: Using scan
ocrad-decolorize-invert: /usr/bin/ocrad -s5 -i $input
Jul  7 13:46:20.444 [4590] info: FuzzyOcr: Using scan
ocrad-decolorize: /usr/bin/ocrad -s5 $input
Jul  7 13:46:20.444 [4590] info: FuzzyOcr: Using scan gocr: $gocr -i $input
Jul  7 13:46:20.444 [4590] info: FuzzyOcr: Using scan gocr-180: $gocr
-l 180 -d 2 -i $input
Jul  7 13:46:20.444 [4590] info: FuzzyOcr: Using scan tesseract:
$tesseract $input $output
Jul  7 13:46:20.444 [4590] info: FuzzyOcr: Added <44> words from
"/etc/mail/spamassassin/FuzzyOcr.words"
Jul  7 13:46:25.509 [4590] dbg: FuzzyOcr: Starting FuzzyOcr...
Jul  7 13:46:25.509 [4590] info: FuzzyOcr: Processing Message with ID
"<no messageid>" (<no sender> -> <no receipients>)
Jul  7 13:46:25.509 [4590] dbg: FuzzyOcr: Skipping OCR, no image files found...
Jul  7 13:46:25.509 [4590] dbg: FuzzyOcr: Processed in 0.000398 sec.


>> On Wed, Jul 6, 2011 at 4:40 PM, John Hardin <jhardin@impsec.org> wrote:
>>>>
>>>> On Thu, Jun 23, 2011 at 1:56 PM, polloxx <polloxx@gmail.com> wrote:
>>>>>
>>>>> after an apt-get upgrade FuzzyOCR has stopped working. I get the
>>>>> following error in the log:
>>>>>
>>>>> FuzzyOCR: 2011-06-22 17:00:38 [3057] /usr/bin/jpegtopnm: Returned
>>>>> [2048], skipping...
>>>
>>> What happens when you try to run /usr/bin/jpegtopnm from the command
>>> line?
>>>
>>> What does the jpegtopnm man page say about that return code?
>>>
>>> Did apt-get change the jpegtopnm program?
>
> --
>  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
>  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
>  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
> -----------------------------------------------------------------------
>  I would buy a Mac today if I was not working at Microsoft.
>                          -- James Allchin, Microsoft VP of Platforms
> -----------------------------------------------------------------------
>  Tomorrow: Robert Heinlein's 104th birthday
>

Mime
View raw message