lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: UNIX command-line indexing script?
Date Mon, 15 Mar 2004 18:29:58 GMT
Erik and I are putting finishing touches on it, so by Summer (this one
;)).

Otis

--- Charlie Smith <SmithCW@ldschurch.org> wrote:
> So, how upcoming is this book going to be?
> 
> >>> otis_gospodnetic@yahoo.com 3/15/2004 3:39:39 AM >>>
> To add to this.
> The upcoming Lucene in Action book has ready to use code that will
> handle and index files in most popular file formats.
> 
> Otis
> 
> --- Erik Hatcher <erik@ehatchersolutions.com> wrote:
> > Have a look at the Ant <index> task in the Lucene sandbox.  You're
> on
> > 
> > your own, currently, to build this and understand it, but I use it 
> > frequently.  In fact, the sample index from our book is generated
> > with 
> > this:
> > 
> >      <index index="${build.dir}/index"
> >        documenthandler="lia.common.TestDataDocumentHandler">
> >        <fileset dir="${data.dir}"/>
> >        <config basedir="${data.dir}"/>
> >      </index>
> > 
> > You can plug in your own DocumentHandler implementation to index 
> > different document types however you like.  The default one indexes
> 
> > .txt and .html files, but a custom implementation can do its own
> > thing. 
> >   Again, to write a DocumentHandler that knows about various
> document
> > 
> > types is not hard you will have to write your own at the moment.
> > 
> > Despite the (minor) amount of work you'll have to do to start using
> 
> > <index> - the infrastructure adds a lot of value: an incremental
> file
> > 
> > system indexer (only new docs get indexed on successive runs).  
> > Plugging this into cron would be trivial.
> > 
> > 	Erik
> > 
> > On Mar 13, 2004, at 11:45 AM, Charlie Smith wrote:
> > 
> > > Anyone written a simple UNIX command-line indexing script which
> > will 
> > > read a
> > > bunch off different kinds of docs and index them?  I'd like to
> make
> > a 
> > > cron job
> > > out of this so as to be able to come back and read it later
> during
> > a 
> > > search.
> > >
> > > PERL or JAVA script would be fine.
> > >
> > >
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org 
> > For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org 
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org 
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message