Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 25478 invoked from network); 15 Mar 2004 10:39:57 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 15 Mar 2004 10:39:57 -0000 Received: (qmail 75396 invoked by uid 500); 15 Mar 2004 10:39:26 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 75196 invoked by uid 500); 15 Mar 2004 10:39:25 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 75180 invoked from network); 15 Mar 2004 10:39:25 -0000 Received: from unknown (HELO web12702.mail.yahoo.com) (216.136.173.239) by daedalus.apache.org with SMTP; 15 Mar 2004 10:39:25 -0000 Message-ID: <20040315103939.45704.qmail@web12702.mail.yahoo.com> Received: from [194.152.209.14] by web12702.mail.yahoo.com via HTTP; Mon, 15 Mar 2004 02:39:39 PST Date: Mon, 15 Mar 2004 02:39:39 -0800 (PST) From: Otis Gospodnetic Subject: Re: UNIX command-line indexing script? To: Lucene Users List In-Reply-To: <88937119-766C-11D8-8BA1-000393A564E6@ehatchersolutions.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N To add to this. The upcoming Lucene in Action book has ready to use code that will handle and index files in most popular file formats. Otis --- Erik Hatcher wrote: > Have a look at the Ant task in the Lucene sandbox. You're on > > your own, currently, to build this and understand it, but I use it > frequently. In fact, the sample index from our book is generated > with > this: > > documenthandler="lia.common.TestDataDocumentHandler"> > > > > > You can plug in your own DocumentHandler implementation to index > different document types however you like. The default one indexes > .txt and .html files, but a custom implementation can do its own > thing. > Again, to write a DocumentHandler that knows about various document > > types is not hard you will have to write your own at the moment. > > Despite the (minor) amount of work you'll have to do to start using > - the infrastructure adds a lot of value: an incremental file > > system indexer (only new docs get indexed on successive runs). > Plugging this into cron would be trivial. > > Erik > > On Mar 13, 2004, at 11:45 AM, Charlie Smith wrote: > > > Anyone written a simple UNIX command-line indexing script which > will > > read a > > bunch off different kinds of docs and index them? I'd like to make > a > > cron job > > out of this so as to be able to come back and read it later during > a > > search. > > > > PERL or JAVA script would be fine. > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org