lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Victor Hadianto <vict...@nuix.com.au>
Subject Re: PDF / Word document parsers
Date Fri, 19 Apr 2002 06:38:16 GMT
> I have been looking for PDF and Word document parsers.  I have tried the
> contributions page on the Lucene site as suggested by a Lucene User. The
> PJEtymon does not have a Windows version.  The XPDF does not do the parsing
> very well.

I've run Etymon with some degree of success in window boxes. To parse word 
document you can have a look for OpenOffice. You can start OpenOffice to 
receive a socket connection. From your Java app, you open a connection to 
OpenOffice (using OpenOffice SDK), send the word document and it will convert 
it to text.

You can also use OpenOffice various other parsing. The url: www.openoffice.org

Note: I've never tried OpenOffice under windows, so I'm not sure how it will 
work, but we are using it here to index our word documents.

Regards,

-- 
Victor Hadianto
---------------
More are taken in by hope than by cunning. -- Vauvenargues

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message