lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Ackley" <ryanack...@gmail.com>
Subject Re: index word files ( doc )
Date Sat, 24 Mar 2007 21:58:42 GMT
As the author of both Word POI and textmining.org, I recommend using
textmining.org. POI is for general purpose manipulation of Word
documents. textmining's only purpose is extracting text.

Also, people recommend using POI for text extraction but the only
place I've seen an actual how-to on this is in the "Lucene in Action"
book.

On 3/24/07, jafarim <jafarim@gmail.com> wrote:
> Can anyone make a comparison between the two, namely POI API and the one
> from textmining.org?
>
> On 3/24/07, Ryan Ackley <ryanackley@gmail.com> wrote:
> >
> > The site is down but you can download the word extractor library direct
> > here:
> >
> > http://www.textmining.org/textmining.zip
> >
> > Going to fix the site this weekend.
> >
> > On 3/24/07, Sami Siren <ssiren@gmail.com> wrote:
> > > Antony Bowesman wrote:
> > >
> > > >> Are there other sollutions?
> > >
> > > There's also antiword [1] which can convert your .doc to plain text or
> > > PS, not sure how good it is.
> > >
> > > --
> > >  Sami Siren
> > >
> > > [1] http://www.winfield.demon.nl/
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message