lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikael Söderman <mik...@excito.se>
Subject Re: Extracting Complete Text from PDF using Lucene and JPEDAL!!!!
Date Mon, 14 Oct 2002 10:37:08 GMT
Hi Vin!

With JPedal you process one page at a time by calling the method decodePage
and supply the number of the page you want to process as argument.

In the example ExtractTextObjects the total number of pages is hard-coded to
1 (the variable end is set to 1 in the constructor), try to set the number
of pages by using the getPageCount method instead.

Best regards

Mikael Söderman

PS. Don't forget to always call flushObjectValues when done with a page.
This will make JPedal reuse memory.


----- Original Message -----
From: "Vinod Bhagat" <vbhagat@blastradius.com>
To: "'Lucene Users List'" <lucene-user@jakarta.apache.org>
Sent: Monday, October 14, 2002 11:26 AM
Subject: Extracting Complete Text from PDF using Lucene and JPEDAL!!!!


> Dear People
>
>   I am using Lucene and one of the requirement is to index PDF. I am using
> JPEDAL's  API to extract text from PDF.  Till now i manage to get the text
> of the first page, I am using the ExtractTextObject.java class to do the
> above. But i want to extract the complete text of the PDF file. Have
anyone
> done this and possible could guide me towards it.
>
>  Appritiate for your positive and quick reply.
>
>  Cheers
> Vin.
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>
>



--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message