pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Picella <dpice...@gmail.com>
Subject pdfboxpreparator and Regain
Date Sun, 06 Dec 2009 15:05:32 GMT
I'm using the Regain search engine powered by Lucene

It has integration with pdfbox using a special indexing preparator called

Does anyone know if PdfBoxPreparator will extract data from the title,
author, and keyword sections of the pdf?  Also, what pdf versions are
compatible?  Thank you!

Here is a post in the Regain forum that I submitted, but I have not heard

I am saving PDFs on my system that are "scanned" and therefore there is no
text available in the body.  I am looking for a good way to find these and I
was thinking that I could do so by editing the title, keywords, and author
lines in the PDF.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message