pdfbox-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Support for Lucene 3.0 in LucenePDFDocument.getDocument
Date Wed, 02 Dec 2009 10:05:04 GMT

On Wed, Dec 2, 2009 at 9:11 AM,  <Frank.Pientka@materna.de> wrote:
> when I use pdfbox-0.8.0-incubating with Lucene 3.0, 2.9 I get warnings
> on the command line
> [...]
> INFO: unsupported/disabled operation: rg
> org.apache.pdfbox.util.PDFStreamEngine processOperator
> LucenePDFDocument.addContent(Document, InputStream, String) line: 413

Yes, I see that too and I'm a bit annoyed by it. There's actually no
reason for this warning as those operations are not relevant to text
extraction and thus PDFBox does the right thing by just ignoring them.
We just need to disable the log messages for such cases.

Can you file an improvement request about this in
https://issues.apache.org/jira/browse/PDFBOX? I can take it from

> I have to use commons-logging, otherwise i get a Class not found
> Exception, but log slows down processing

The commons-logging dependency is a bit troublesome in terms of
classpath handling. Perhaps PDFBox should use the standard
java.util.logging instead. Can you file an improvement request for
that as well? We'll need to discuss it on dev@pdfbox.

> My questions are when will pdfbox support the changed API from Lucene
> 3.0 in LucenePDFDocument.getDocument?

As soon as someone writes a patch with the required changes. :-) In
fact I'd actually rather see us not depending on the Lucene API. A
better approach would be to make the LucenePDFDocument class (or a
more generically named alternative) simply return a Map of defined
key-value pairs that the client application can then turn into a
Lucene Document.

> Why must i use bcprov-jdk14-136.jar and bcmail-jdk14-136.jar to just
> check if PDF documents are encrypted?

They are needed for actually encrypting or decrypting a document, but
I guess they should be (or at least we could make them) optional if
you just want to check whether the document is encrypted or not.


Jukka Zitting

View raw message