lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: RangeFilter performance problem using MultiReader
Date Sat, 11 Apr 2009 14:36:16 GMT
In addition to merging each month into one index instead of all in one
index, you could also do some additional optimization when using the Range
filter:
Just combine only those indexes needed to fulfil the range spec during
search. So if somebody want to filter Jan 15 to Feb 15, only create a
MultiReader of the indexes for Jan and Feb, this would speed up the whole
search (also for terms), as the filter would simply remove all documents
from the wrong months.

But the best would be to use TrieRange :)

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Michael McCandless [mailto:lucene@mikemccandless.com]
> Sent: Saturday, April 11, 2009 4:03 PM
> To: java-user@lucene.apache.org
> Subject: Re: RangeFilter performance problem using MultiReader
> 
> Ahhh, OK, perhaps that explains the sizable perf difference you're
> seeing w/ optimized vs not.  I'm curious to see the results of your
> "merge each month into 1 index" test...
> 
> Mike
> 
> On Sat, Apr 11, 2009 at 9:21 AM, Roberto Franchini
> <ro.franchini@gmail.com> wrote:
> > On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless
> > <lucene@mikemccandless.com> wrote:
> >> Hmm then I'm a bit baffled again.
> >>
> >> Because, each of your "by month" indexes presumably has a unique
> >> subset of terms for the "date_doc" field?  Meaning, a given "by month"
> >> index will have all date_doc corresponding to that month, and a
> >> different "by month" index would presumably have no overlap in the
> >> terms for the date_doc field.
> >
> > Yes and no :) In this situation:
> >
> >>> 200901-->index1, index2
> >>> 200902-->index3
> >>> 200903-->index4,index5,index6
> >
> > each month does not overlap with each other, but index1 and index2
> > overlap, and so index4 with 5 and 6. So there's overlapping inside a
> > single month.
> > So I want to trie, next week, this one:
> >>> 200901-->index12 (merge of 1 and 2)
> >>> 200902-->index3
> >>> 200903-->index456 (merge of 4,5,6)
> >
> > This way we avoid overlapping inside a single month. Maybe this can
> > help: stay tuned :)
> > R.
> >
> >
> > --
> > Roberto Franchini
> > http://www.celi.it
> > http://www.blogmeter.it
> > http://www.memesphere.it
> > Tel +39-011-6600814
> > jabber:ro.franchini@gmail.com skype:ro.franchini
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message