pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Loiodice <loiod...@csdisco.com>
Subject Detect Invisible Text (placed by tools which make searchable PDF)
Date Fri, 03 May 2019 15:02:56 GMT

I would need to remove (often low quality) invisible text placed on images
tools which use OCR to make searchable PDF.

We use pdfbox ourselves to make searchable PDF... and we use
setRenderingMode(RenderingMode.NEITHER); when we place the text to
make it invisible.We also use pdfbox's text stripper to remove text from

What I am not sure if there is a way for the text stripper to identify the
characters that
have been placed as invisible and only remove those in some cases.

Thanks for your help,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message