lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Rod.Mad...@ferguson.com>
Subject RE: PDFBox PDFExtractor
Date Mon, 12 Sep 2005 16:06:50 GMT
Thanks for reply Jeroen ...does anyone have any
experience / comments regarding the use of PDFTextStream
versus PDFExtractor for working with PDF files ...the
issue for us is that there appears to be very high
memory usage when we work with PDF's using PDFExtractor.

I have heard that PDFTextStream may be a better solution.

Rod

-----Original Message-----
From: Jeroen Reijn [mailto:j.reijn@hippo.nl] 
Sent: Monday, September 12, 2005 11:58 AM
To: java-user@lucene.apache.org
Subject: Re: PDFBox PDFExtractor

Hi Rod,

PDFBox is a seperate project. The PDFExtractor in Jakarta Slide uses
PDFBox's 
functionality to extract the information from the .pdf file.

Hope this answers your question.

Jeroen


Rod.Madden@ferguson.com wrote:
> Hi,
> 
>  
> 
> I am new to Lucene and looking at some existing Lucene code....
> 
>  
> 
> I am confused about the relationship ( if any ) between 
> 
> org.apache.slide.extractor.PDFExtractor methods and org.PDFBox.cos
> methods
> 
> for the purposes of working with PDF files.
> 
>  
> 
> I have found info on the web regarding PDFBox, however, I have found
> little
> 
> regarding .PDFExtractor.
> 
>  
> 
> I am curious since we are having some issues with indexing PDF files
and
> 
> I am wondering if PDFExtractor implements PDFBox or if it is a
separate 
> 
> utility set.
> 
>  
> 
> Rod.
> 
>  
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message