pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Loiodice <loiod...@csdisco.com>
Subject Re: Detect Invisible Text (placed by tools which make searchable PDF)
Date Fri, 03 May 2019 15:23:32 GMT
Excellent, looks promising, thanks a lot for your help!

A related (still in the area of low quality extracted text) question ...
would it be also possible to detect which characters are drawn with a
font with no unicode mappings? I generally know for example how to detect
if a PDF has for example a type 3 font with no unicode
mapping, but sometimes that font is only used for a small portion of the
characters in the page and wanted to special handle those characters.

Thanks again





On Fri, May 3, 2019 at 10:07 AM Tilman Hausherr <THausherr@t-online.de>
wrote:

> These answers may help:
>
> https://stackoverflow.com/questions/50044892/pdfbox-invisible-text-from-pdftextstripper-not-clip-path-or-color-issue
>
> https://stackoverflow.com/questions/50487520/pdfbox-2-0-invisible-text-from-pdftextstripper
>
> Tilman
>
> Am 03.05.2019 um 17:02 schrieb Luca Loiodice:
> > Hello,
> >
> > I would need to remove (often low quality) invisible text placed on
> images
> > by
> > tools which use OCR to make searchable PDF.
> >
> > We use pdfbox ourselves to make searchable PDF... and we use
> > setRenderingMode(RenderingMode.NEITHER); when we place the text to
> > make it invisible.We also use pdfbox's text stripper to remove text from
> > PDF.
> >
> > What I am not sure if there is a way for the text stripper to identify
> the
> > characters that
> > have been placed as invisible and only remove those in some cases.
> >
> > Thanks for your help,
> > Luca
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message