Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 78426 invoked from network); 9 Jan 2004 01:38:33 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 9 Jan 2004 01:38:33 -0000 Received: (qmail 95449 invoked by uid 500); 9 Jan 2004 01:38:12 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 95423 invoked by uid 500); 9 Jan 2004 01:38:11 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 95409 invoked from network); 9 Jan 2004 01:38:11 -0000 Received: from unknown (HELO mz2.forethought.net) (216.241.36.13) by daedalus.apache.org with SMTP; 9 Jan 2004 01:38:11 -0000 Received: from j72.denver.dsl.forethought.net ([216.241.38.72]) by mz2.forethought.net with esmtp (Exim 4.14) id 1AelbP-0005jK-Jo for lucene-user@jakarta.apache.org; Thu, 08 Jan 2004 18:38:19 -0700 From: Tatu Saloranta Reply-To: tatu@hypermall.net Organization: Linux-users missalie To: "Lucene Users List" Subject: Re: Performance question Date: Thu, 8 Jan 2004 18:42:02 -0700 User-Agent: KMail/1.5 References: <039AE64F5C9D7C44A0FE1AD71DD835CF78094B@sideshow.mainstreamdata.com> <20040108034856.GG31696@rlx11.zapatec.com> In-Reply-To: <20040108034856.GG31696@rlx11.zapatec.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200401081842.02423.tatu@hypermall.net> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Wednesday 07 January 2004 20:48, Dror Matalon wrote: > On Wed, Jan 07, 2004 at 07:24:22PM -0700, Scott Smith wrote: ... > > Thanks for the suggestions. I wonder how much faster I can go if I > > implement some of those? > > 25 msecs to insert a document is on the high side, but it depends of > course on the size of your document. You're probably spending 90% of > your time in the XML parsing. I believe that there are other parsers > that are faster than xerces, you might want to look at these. You might > want to look at http://dom4j.org/. I think more significant than whether one uses DOM or some other full-document in-memory parser, is whether to perhaps use streaming (usually event-based) parsers such as ones using SAX. These are generally an order of magnitude faster, at least for bigger documents. Fortunately many standard XML parsers can work as both DOM and SAX parsers (I believe Xerces at least does, in any case). It's bit more cumbersome to use event-based parsers (push vs. pull; need to explicitly keep track of current subtree, if parent tag order matters), but from performance perspective (memory usage, speed) it may be worth it. -+ Tatu +- --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org