lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Ackley" <ryanack...@gmail.com>
Subject Re: index word files ( doc )
Date Mon, 26 Mar 2007 09:41:20 GMT
That is good to know thank you. Looking at their documentation, their
preview seems to show the contents of the index for a particular file
and you can transform this using xml. I can see how this would be
useful. What I was proposing was a conversion from the binary format
to html and including the rich formatting.

On 3/26/07, jafarim <jafarim@gmail.com> wrote:
> Good to know that your devised commercial feature is already offered by
> Enhydra Snapper as an open-source feature.
> Check here: http://www.enhydra.org/apps/snapper/index.html
>
> On 3/26/07, Ryan Ackley <ryanackley@gmail.com> wrote:
> >
> > Yes I do have plans for adding fast save support and support for more
> > file formats. The time frame for this happening is the next couple of
> > months.
> >
> > I'm playing with the idea of offering a commercial version. I want to
> > continue to support the open source community so I want to keep it
> > open source or free and add value that people would be willing to pay
> > for.
> >
> > Any comments on this are appreciated. One thing I thought of would be
> > to continue to offer the text extraction as open source but add html
> > conversion with hit highlighting for a variety of file formats as a
> > commercial add on. Is this something anyone would pay for? What are
> > some other pain points of the Lucene community besides text
> > extraction?
> >
> > On 3/25/07, Antony Bowesman <adb@teamware.com> wrote:
> > > I've been using Ryan's textmining in prefence to the POI as internally
> > TM uses
> > > POI and the Word6 extractor so handles a greater variety of files.
> > >
> > > Ryan, thanks for fixing your site.  Do you have any plans/ideas on how
> > to parse
> > > the 'fast-saved' files and any ideas on Word files older than the Word 6
> > format?
> > >
> > > Regards
> > > Antony
> > >
> > >
> > > Ryan Ackley wrote:
> > > > As the author of both Word POI and textmining.org, I recommend using
> > > > textmining.org. POI is for general purpose manipulation of Word
> > > > documents. textmining's only purpose is extracting text.
> > > >
> > > > Also, people recommend using POI for text extraction but the only
> > > > place I've seen an actual how-to on this is in the "Lucene in Action"
> > > > book.
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message