lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jafarim <jafa...@gmail.com>
Subject Re: index word files ( doc )
Date Mon, 26 Mar 2007 08:38:32 GMT
Good to know that your devised commercial feature is already offered by
Enhydra Snapper as an open-source feature.
Check here: http://www.enhydra.org/apps/snapper/index.html

On 3/26/07, Ryan Ackley <ryanackley@gmail.com> wrote:
>
> Yes I do have plans for adding fast save support and support for more
> file formats. The time frame for this happening is the next couple of
> months.
>
> I'm playing with the idea of offering a commercial version. I want to
> continue to support the open source community so I want to keep it
> open source or free and add value that people would be willing to pay
> for.
>
> Any comments on this are appreciated. One thing I thought of would be
> to continue to offer the text extraction as open source but add html
> conversion with hit highlighting for a variety of file formats as a
> commercial add on. Is this something anyone would pay for? What are
> some other pain points of the Lucene community besides text
> extraction?
>
> On 3/25/07, Antony Bowesman <adb@teamware.com> wrote:
> > I've been using Ryan's textmining in prefence to the POI as internally
> TM uses
> > POI and the Word6 extractor so handles a greater variety of files.
> >
> > Ryan, thanks for fixing your site.  Do you have any plans/ideas on how
> to parse
> > the 'fast-saved' files and any ideas on Word files older than the Word 6
> format?
> >
> > Regards
> > Antony
> >
> >
> > Ryan Ackley wrote:
> > > As the author of both Word POI and textmining.org, I recommend using
> > > textmining.org. POI is for general purpose manipulation of Word
> > > documents. textmining's only purpose is extracting text.
> > >
> > > Also, people recommend using POI for text extraction but the only
> > > place I've seen an actual how-to on this is in the "Lucene in Action"
> > > book.
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message