spamassassin-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Hardin <jhar...@impsec.org>
Subject Re: Is fuzzyocr i.e. Image scanning
Date Wed, 17 Oct 2018 14:56:19 GMT
On Wed, 17 Oct 2018, Matus UHLAR - fantomas wrote:

> On 16.10.18 18:42, RW wrote:
>> Bayes might work, but I wouldn't like to see it added to body text
>> because corrupted text could look like obfuscation.
>
> it should be pushed back to body text just for filters like bayes.
> The same could/should be done for attachhed .doc, .pdf files etc.

...which would be much more reliable than OCR.

If it was a resource-allocation decision for pulling text from doc/pdf vs. 
updating OCR, I'd push for the former.

-- 
  John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
  jhardin@impsec.org    FALaholic #11174     pgpk -a jhardin@impsec.org
  key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
   The problem is when people look at Yahoo, slashdot, or groklaw and
   jump from obvious and correct observations like "Oh my God, this
   place is teeming with utter morons" to incorrect conclusions like
   "there's nothing of value here".        -- Al Petrofsky, in Y! SCOX
-----------------------------------------------------------------------
  566 days since the first commercial re-flight of an orbital booster (SpaceX)

Mime
View raw message