pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Loiodice <loiod...@csdisco.com>
Subject Detect Invisible Text (placed by tools which make searchable PDF)
Date Fri, 03 May 2019 15:02:56 GMT
Hello,

I would need to remove (often low quality) invisible text placed on images
by
tools which use OCR to make searchable PDF.

We use pdfbox ourselves to make searchable PDF... and we use
setRenderingMode(RenderingMode.NEITHER); when we place the text to
make it invisible.We also use pdfbox's text stripper to remove text from
PDF.

What I am not sure if there is a way for the text stripper to identify the
characters that
have been placed as invisible and only remove those in some cases.

Thanks for your help,
Luca

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message