Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 16102 invoked from network); 13 Mar 2008 05:05:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Mar 2008 05:05:20 -0000 Received: (qmail 50371 invoked by uid 500); 13 Mar 2008 05:05:11 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 50335 invoked by uid 500); 13 Mar 2008 05:05:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 50324 invoked by uid 99); 13 Mar 2008 05:05:11 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Mar 2008 22:05:11 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [203.217.22.128] (HELO file1.syd.nuix.com.au) (203.217.22.128) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Mar 2008 05:04:34 +0000 Received: from host68.syd.nuix.com.au (host68.syd.nuix.com.au [192.168.222.68]) by file1.syd.nuix.com.au (Postfix) with ESMTP id 909B14A8144 for ; Thu, 13 Mar 2008 16:04:33 +1100 (EST) From: Daniel Noll Organization: Nuix Pty Ltd To: java-user@lucene.apache.org Subject: Re: Document ID shuffling under 2.3.x (on merge?) Date: Thu, 13 Mar 2008 16:00:11 +1100 User-Agent: KMail/1.9.6 (enterprise 0.20070907.709405) References: <200803111645.27956.daniel@nuix.com> <200803121226.51420.daniel@nuix.com> <359a92830803120642x3a9afd62p7abbf0104b02cf21@mail.gmail.com> In-Reply-To: <359a92830803120642x3a9afd62p7abbf0104b02cf21@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200803131600.11741.daniel@nuix.com> X-Virus-Checked: Checked by ClamAV on apache.org On Thursday 13 March 2008 00:42:59 Erick Erickson wrote: > I certainly found that lazy loading changed my speed dramatically, but > that was on a particularly field-heavy index. > > I wonder if TermEnum/TermDocs would be fast enough on an indexed > (UN_TOKENIZED???) field for a unique id. > > Mostly, I'm hoping you'll try this and tell me if it works so I don't have > to sometime .... I added a "uid" field to our existing fields. After the load there were some gaps in the values for this field; presumably those were documents where adding the doc failed and adding the fallback doc also failed. The index contains 20004 documents. Each test I ran over 10 iterations and times below are an average of the last 5 as it took around 5 rounds to warm up. Filter building, for a filter returning 1000 documents randomly selected: Time to build filter by UID (100% Derby) - 93ms Additional time to build filter by DocID - 12ms (13% penalty) 13% penalty is acceptable IMO. The problem comes next. Bulk operation building, for a query returning around 2800 documents: Time to build the bulkop by DocID (100% Hits) - 6ms Time to fetch the "uid" field from the document - 152ms (2600% penalty) Time to do the DB query (not counting commit though) - 10ms For interest's sake I also timed fetching the document with no FieldSelector, that takes around 410ms for the same documents. So there is still a big benefit in using the field selector, it just isn't anywhere near enough to get it close to the time it takes to retrieve the doc IDs. Daniel --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org