lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: MS-Word docs.
Date Sun, 27 Nov 2005 17:16:17 GMT
Hello Steven,

There is a small ready-to-do framework in Lucene in Action that you can
use to indes MS Word, PDF, RTF, XML, and plain0text docs -
http://lucenebook.com/ .  I suggest you stick with POI libraries, as it
looks like Textmining code is no longer maintained.

Otis

--- Steven Bell <sfbell@keasdesign.net> wrote:

> Hi,
> 
> I am stumped. I can't seem to get word docs indexed. I have tried
> both 
> POI and textmining libraries to little or no real affect.
> I dump the doc files into a text file with the same variable I use in
> 
> the Lucene doc.add(Field.UnStored("content", textStr));| and they
> look 
> fine in the file. However searches return nothing.
> 
> Is there a good, solid tutorial on how to best accomplish the
> indexing 
> and searching of word documents?
> 
> Thanks.
> Steve
> |
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message