Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 22772 invoked from network); 15 Mar 2004 10:36:17 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 15 Mar 2004 10:36:17 -0000 Received: (qmail 64767 invoked by uid 500); 15 Mar 2004 10:35:45 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 64741 invoked by uid 500); 15 Mar 2004 10:35:45 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 64724 invoked from network); 15 Mar 2004 10:35:44 -0000 Received: from unknown (HELO c000.snv.cp.net) (209.228.32.71) by daedalus.apache.org with SMTP; 15 Mar 2004 10:35:44 -0000 Received: (cpmta 21289 invoked from network); 15 Mar 2004 02:35:57 -0800 Received: from 24.51.109.181 (HELO ?192.168.1.100?) by smtp.hatcher.net (209.228.32.71) with SMTP; 15 Mar 2004 02:35:57 -0800 X-Sent: 15 Mar 2004 10:35:57 GMT Mime-Version: 1.0 (Apple Message framework v612) In-Reply-To: References: Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <88937119-766C-11D8-8BA1-000393A564E6@ehatchersolutions.com> Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: UNIX command-line indexing script? Date: Mon, 15 Mar 2004 05:35:52 -0500 To: "Lucene Users List" X-Mailer: Apple Mail (2.612) X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Have a look at the Ant task in the Lucene sandbox. You're on your own, currently, to build this and understand it, but I use it frequently. In fact, the sample index from our book is generated with this: You can plug in your own DocumentHandler implementation to index different document types however you like. The default one indexes .txt and .html files, but a custom implementation can do its own thing. Again, to write a DocumentHandler that knows about various document types is not hard you will have to write your own at the moment. Despite the (minor) amount of work you'll have to do to start using - the infrastructure adds a lot of value: an incremental file system indexer (only new docs get indexed on successive runs). Plugging this into cron would be trivial. Erik On Mar 13, 2004, at 11:45 AM, Charlie Smith wrote: > Anyone written a simple UNIX command-line indexing script which will > read a > bunch off different kinds of docs and index them? I'd like to make a > cron job > out of this so as to be able to come back and read it later during a > search. > > PERL or JAVA script would be fine. > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org