lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: UNIX command-line indexing script?
Date Mon, 15 Mar 2004 10:39:39 GMT
To add to this.
The upcoming Lucene in Action book has ready to use code that will
handle and index files in most popular file formats.

Otis

--- Erik Hatcher <erik@ehatchersolutions.com> wrote:
> Have a look at the Ant <index> task in the Lucene sandbox.  You're on
> 
> your own, currently, to build this and understand it, but I use it 
> frequently.  In fact, the sample index from our book is generated
> with 
> this:
> 
>      <index index="${build.dir}/index"
>        documenthandler="lia.common.TestDataDocumentHandler">
>        <fileset dir="${data.dir}"/>
>        <config basedir="${data.dir}"/>
>      </index>
> 
> You can plug in your own DocumentHandler implementation to index 
> different document types however you like.  The default one indexes 
> .txt and .html files, but a custom implementation can do its own
> thing. 
>   Again, to write a DocumentHandler that knows about various document
> 
> types is not hard you will have to write your own at the moment.
> 
> Despite the (minor) amount of work you'll have to do to start using 
> <index> - the infrastructure adds a lot of value: an incremental file
> 
> system indexer (only new docs get indexed on successive runs).  
> Plugging this into cron would be trivial.
> 
> 	Erik
> 
> On Mar 13, 2004, at 11:45 AM, Charlie Smith wrote:
> 
> > Anyone written a simple UNIX command-line indexing script which
> will 
> > read a
> > bunch off different kinds of docs and index them?  I'd like to make
> a 
> > cron job
> > out of this so as to be able to come back and read it later during
> a 
> > search.
> >
> > PERL or JAVA script would be fine.
> >
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message