lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Ackley" <ryanack...@gmail.com>
Subject Re: index word files ( doc )
Date Sun, 25 Mar 2007 23:36:14 GMT
Yes I do have plans for adding fast save support and support for more
file formats. The time frame for this happening is the next couple of
months.

I'm playing with the idea of offering a commercial version. I want to
continue to support the open source community so I want to keep it
open source or free and add value that people would be willing to pay
for.

Any comments on this are appreciated. One thing I thought of would be
to continue to offer the text extraction as open source but add html
conversion with hit highlighting for a variety of file formats as a
commercial add on. Is this something anyone would pay for? What are
some other pain points of the Lucene community besides text
extraction?

On 3/25/07, Antony Bowesman <adb@teamware.com> wrote:
> I've been using Ryan's textmining in prefence to the POI as internally TM uses
> POI and the Word6 extractor so handles a greater variety of files.
>
> Ryan, thanks for fixing your site.  Do you have any plans/ideas on how to parse
> the 'fast-saved' files and any ideas on Word files older than the Word 6 format?
>
> Regards
> Antony
>
>
> Ryan Ackley wrote:
> > As the author of both Word POI and textmining.org, I recommend using
> > textmining.org. POI is for general purpose manipulation of Word
> > documents. textmining's only purpose is extracting text.
> >
> > Also, people recommend using POI for text extraction but the only
> > place I've seen an actual how-to on this is in the "Lucene in Action"
> > book.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message