lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From petite_abeille <petite_abei...@mac.com>
Subject Re: indexing PDF files
Date Wed, 01 May 2002 07:15:08 GMT
On Tuesday, April 30, 2002, at 10:46 PM, Otis Gospodnetic wrote:

> Hm, this should be a FAQ.

Maybe it should... ;-)

> Check Lucene contributions page, there are some starting points there,

Well, this seems to be a very popular request... In fact I need 
something like that also. Unfortunately, there seems to be no 
authoritative answer as far as converting pdf files to text in a pure 
Java environment... Maybe I'm missing something here as usual?

Also, on a related note, what would be a good approach to convert any 
random document into pdf? I was thinking to have a two steps process for 
document indexing in Lucene:

- First, convert everything to pdf (with Acrobat or something)
- Second, convert pdf to text and index it.

Any practical suggestions about how to do that in a pure Java 
environment very welcome.

Thanks :-)

PA.


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message