lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Victor Hadianto <>
Subject Re: PDF / Word document parsers
Date Fri, 19 Apr 2002 06:38:16 GMT
> I have been looking for PDF and Word document parsers.  I have tried the
> contributions page on the Lucene site as suggested by a Lucene User. The
> PJEtymon does not have a Windows version.  The XPDF does not do the parsing
> very well.

I've run Etymon with some degree of success in window boxes. To parse word 
document you can have a look for OpenOffice. You can start OpenOffice to 
receive a socket connection. From your Java app, you open a connection to 
OpenOffice (using OpenOffice SDK), send the word document and it will convert 
it to text.

You can also use OpenOffice various other parsing. The url:

Note: I've never tried OpenOffice under windows, so I'm not sure how it will 
work, but we are using it here to index our word documents.


Victor Hadianto
More are taken in by hope than by cunning. -- Vauvenargues

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message