pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Stallion <bradstall...@yahoo.com>
Subject Exclude text from invisible layouts
Date Mon, 18 Mar 2013 11:15:18 GMT
Hi all,

I've asked this on tika mailing list and I was told to ask to PDFBox team:

I'm extracting text from PDF files using my own sax handler. The problem is that I get both
visible and invisible text, i.e. text contained in invisible parts of the layout.
How can I identify the invisible parts?

I've asked to stack overflow as well:

http://stackoverflow.com/questions/14956556/tika-and-invisible-text-from-pdf

Thanks a lot for your help!
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message