Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 27399 invoked from network); 26 Aug 2004 03:46:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 26 Aug 2004 03:46:21 -0000 Received: (qmail 2080 invoked by uid 500); 26 Aug 2004 03:45:49 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 1946 invoked by uid 500); 26 Aug 2004 03:45:48 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 1889 invoked by uid 99); 26 Aug 2004 03:45:42 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [203.199.26.74] (HELO daakghar.controlnet.co.in) (203.199.26.74) by apache.org (qpsmtpd/0.27.1) with SMTP; Wed, 25 Aug 2004 20:45:39 -0700 Received: from karthik ([192.168.4.1]) by dakiya.controlnet.co.in (Netscape Messaging Server 4.15) with ESMTP id I31CEC00.CJY for ; Thu, 26 Aug 2004 09:29:00 +0530 From: "Karthik N S" To: "Lucene Users List" Subject: RE: Time to index documents Date: Thu, 26 Aug 2004 09:27:04 +0530 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 In-Reply-To: <412D050C.9080309@sun.com> X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi Hetan Th's the major Problem of non Standatrdized Tags for HTML Document's u are Indexing ,resulting in lag time taken for Indexing process.... If u can Tweak the HTMLParser.jj file within lucene.zip '/demo/html' file [U have to have some Knowledge of JAVACC for this]. Karthik -----Original Message----- From: Hetan Shah [mailto:Hetan.Shah@Sun.COM] Sent: Thursday, August 26, 2004 3:01 AM To: Lucene Users List Subject: Time to index documents Hello all, Is there a way to reduce the indexing time taken when the indexer is indexing about 30,000 + files. It is roughly taking around 6-7 hours to do this. I am using IndexHTML class to create the index out of HTML files. Another issue that I see is every once in a while I get the following output on the screen. adding ../31/1104852.html Parse Aborted: Encountered "\"" at line 7, column 1. Was expecting one of: ... "=" ... ... Any suggestions on preventing this from happening? Thanks in advance. -H --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org