Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 98575 invoked from network); 11 Apr 2009 18:41:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Apr 2009 18:41:23 -0000 Received: (qmail 18595 invoked by uid 500); 11 Apr 2009 18:41:21 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 18514 invoked by uid 500); 11 Apr 2009 18:41:21 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 18503 invoked by uid 99); 11 Apr 2009 18:41:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Apr 2009 18:41:21 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 74.125.92.26 as permitted sender) Received: from [74.125.92.26] (HELO qw-out-2122.google.com) (74.125.92.26) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Apr 2009 18:41:12 +0000 Received: by qw-out-2122.google.com with SMTP id 8so1054713qwh.53 for ; Sat, 11 Apr 2009 11:40:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=MxNZfU5dzr9H49KA+iPhbMFeuIoCWnGL1DvqKN+zX7A=; b=R5o0XsruezspTkQTNUHi44Hx4n03t94aEZW38P6/XpW1mP8zVkFwuEQggmvWnGt4aO SAFR7BC1+gxmA4cnwEh/BIr8EA/16Ijw5VAsg3PPcktriyzZaVLUAjn3cYrXlClkul6V zxx8eLtpCvRLSqFtc30LmiBcjKrmL1yaRbD8g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=LVu2AkmV+T7cgEPSLGzcJs/XuLwgmf/d5k4Xl8nPM0L5WuJO05ny9K+np4KYhQ88Rt TInchPHVORwkiDNchJCwEZjeATOWECH8T5n5VYlYrpSYFMrD3sfAAvOSbJyU4+8RmBMZ 6CD+jR3muoD64Jrbda86gg2adZuBjEujgzZLo= MIME-Version: 1.0 Received: by 10.220.91.205 with SMTP id o13mr5559742vcm.92.1239475251773; Sat, 11 Apr 2009 11:40:51 -0700 (PDT) In-Reply-To: <7BBDF4D75D3C4E9BBE69580F0E00CFBA@VEGA> References: <70422ecc0904100738v23806a91gba4fca747ba3748f@mail.gmail.com> <70422ecc0904110227u7870fb51wc20b9fc6501f35a6@mail.gmail.com> <9ac0c6aa0904110248s4fbf83b6nafd28cc9b16b0c46@mail.gmail.com> <63e2e4460904110304n51de11f9m1f0daaa1e84db680@mail.gmail.com> <9ac0c6aa0904110450h4d69905aw1f6d093698bb2528@mail.gmail.com> <63e2e4460904110621s78163340pbf88a99831678ec3@mail.gmail.com> <9ac0c6aa0904110702s5619926bpaab0c950c6d617d5@mail.gmail.com> <359a92830904110941g1973c016ie09d393478e537f5@mail.gmail.com> <7BBDF4D75D3C4E9BBE69580F0E00CFBA@VEGA> Date: Sat, 11 Apr 2009 11:40:51 -0700 Message-ID: <359a92830904111140o128b887cp387b4e70e8afc72d@mail.gmail.com> Subject: Re: RangeFilter performance problem using MultiReader From: Erick Erickson To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001485f9213e1703e604674bd2b6 X-Virus-Checked: Checked by ClamAV on apache.org --001485f9213e1703e604674bd2b6 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Siiiggghhh. So that means I'll have to really look at TrieRange before I can appear competent.. Thanks Erick On Sat, Apr 11, 2009 at 11:23 AM, Uwe Schindler wrote: > This is why I invented TrieRange: Full precision dates but less terms > during > filtering/searching. With TrieRange on the longs returned bay > Date.getTime() > you even have precision of milliseconds without any speed decrease (only > bigger index size). Or double values with full precision, everything is > possible :-) > > ----- > Uwe Schindler > H.-H.-Meier-Allee 63, D-28213 Bremen > http://www.thetaphi.de > eMail: uwe@thetaphi.de > > > -----Original Message----- > > From: Erick Erickson [mailto:erickerickson@gmail.com] > > Sent: Saturday, April 11, 2009 6:42 PM > > To: java-user@lucene.apache.org > > Subject: Re: RangeFilter performance problem using MultiReader > > > > OK, I scanned all the e-mails in this thread so I may be way off base, > but > > has anyone yet asked the basic question of whether the granularity of the > > dates is really necessary ? > > > > Raf and Roberto: > > > > It appears you're indexing your dates down to second resolution, which > > is why your number of unique terms is so high. Will it serve your > use-case > > to only index down to day? or perhaps hour? That will reduce your number > > of terms substantially. There is also the possibility of breaking up your > > dates into two or more fields if you really require the granularity. You > > could > > probably run a quick test of this approach just to see how it would > change > > your search times before investing too muchtime in the process.... > > > > But I'm entirely ignorant of the multireader nuances, so this may be > > completely > > irrelevant.... > > > > Best > > Erick > > > > > > On Sat, Apr 11, 2009 at 7:36 AM, Uwe Schindler wrote: > > > > > In addition to merging each month into one index instead of all in one > > > index, you could also do some additional optimization when using the > > Range > > > filter: > > > Just combine only those indexes needed to fulfil the range spec during > > > search. So if somebody want to filter Jan 15 to Feb 15, only create a > > > MultiReader of the indexes for Jan and Feb, this would speed up the > > whole > > > search (also for terms), as the filter would simply remove all > documents > > > from the wrong months. > > > > > > But the best would be to use TrieRange :) > > > > > > ----- > > > Uwe Schindler > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > http://www.thetaphi.de > > > eMail: uwe@thetaphi.de > > > > > > > -----Original Message----- > > > > From: Michael McCandless [mailto:lucene@mikemccandless.com] > > > > Sent: Saturday, April 11, 2009 4:03 PM > > > > To: java-user@lucene.apache.org > > > > Subject: Re: RangeFilter performance problem using MultiReader > > > > > > > > Ahhh, OK, perhaps that explains the sizable perf difference you're > > > > seeing w/ optimized vs not. I'm curious to see the results of your > > > > "merge each month into 1 index" test... > > > > > > > > Mike > > > > > > > > On Sat, Apr 11, 2009 at 9:21 AM, Roberto Franchini > > > > wrote: > > > > > On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless > > > > > wrote: > > > > >> Hmm then I'm a bit baffled again. > > > > >> > > > > >> Because, each of your "by month" indexes presumably has a unique > > > > >> subset of terms for the "date_doc" field? Meaning, a given "by > > month" > > > > >> index will have all date_doc corresponding to that month, and a > > > > >> different "by month" index would presumably have no overlap in the > > > > >> terms for the date_doc field. > > > > > > > > > > Yes and no :) In this situation: > > > > > > > > > >>> 200901-->index1, index2 > > > > >>> 200902-->index3 > > > > >>> 200903-->index4,index5,index6 > > > > > > > > > > each month does not overlap with each other, but index1 and index2 > > > > > overlap, and so index4 with 5 and 6. So there's overlapping inside > a > > > > > single month. > > > > > So I want to trie, next week, this one: > > > > >>> 200901-->index12 (merge of 1 and 2) > > > > >>> 200902-->index3 > > > > >>> 200903-->index456 (merge of 4,5,6) > > > > > > > > > > This way we avoid overlapping inside a single month. Maybe this can > > > > > help: stay tuned :) > > > > > R. > > > > > > > > > > > > > > > -- > > > > > Roberto Franchini > > > > > http://www.celi.it > > > > > http://www.blogmeter.it > > > > > http://www.memesphere.it > > > > > Tel +39-011-6600814 > > > > > jabber:ro.franchini@gmail.com > > > >skype:ro.franchini > > > > > > > > > > > -------------------------------------------------------------------- > > - > > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > > > > --------------------------------------------------------------------- > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --001485f9213e1703e604674bd2b6--