lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <>
Subject Re: Word Documents
Date Mon, 15 Dec 2003 14:04:06 GMT
On Mon, 15 Dec 2003, Pleasant, Tracy wrote:
> As a spinoff, I was wondering if anyone has been happy with indexing and
> searching Word docs. What about reading the contents? Any problems?

In the scratchpad of POI is src/org/hdf/extractor, which has all the code 
you need to pull out the text of a word document. I use this and some 
simple HPSF code (to extract the document meta data) with Lucene, and it 
works great


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message