lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: RangeFilter performance problem using MultiReader
Date Sat, 11 Apr 2009 18:23:32 GMT
This is why I invented TrieRange: Full precision dates but less terms during
filtering/searching. With TrieRange on the longs returned bay Date.getTime()
you even have precision of milliseconds without any speed decrease (only
bigger index size). Or double values with full precision, everything is
possible :-)

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Saturday, April 11, 2009 6:42 PM
> To: java-user@lucene.apache.org
> Subject: Re: RangeFilter performance problem using MultiReader
> 
> OK, I scanned all the e-mails in this thread so I may be way off base, but
> has anyone yet asked the basic question of whether the granularity of the
> dates is really necessary <G>?
> 
> Raf and Roberto:
> 
> It appears you're indexing your dates down to second resolution, which
> is why your number of unique terms is so high. Will it serve your use-case
> to only index down to day? or perhaps hour? That will reduce your number
> of terms substantially. There is also the possibility of breaking up your
> dates into two or more fields if you really require the granularity. You
> could
> probably run a quick test of this approach just to see how it would change
> your search times before investing too muchtime in the process....
> 
> But I'm entirely ignorant of the multireader nuances, so this may be
> completely
> irrelevant....
> 
> Best
> Erick
> 
> 
> On Sat, Apr 11, 2009 at 7:36 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
> 
> > In addition to merging each month into one index instead of all in one
> > index, you could also do some additional optimization when using the
> Range
> > filter:
> > Just combine only those indexes needed to fulfil the range spec during
> > search. So if somebody want to filter Jan 15 to Feb 15, only create a
> > MultiReader of the indexes for Jan and Feb, this would speed up the
> whole
> > search (also for terms), as the filter would simply remove all documents
> > from the wrong months.
> >
> > But the best would be to use TrieRange :)
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> > > -----Original Message-----
> > > From: Michael McCandless [mailto:lucene@mikemccandless.com]
> > > Sent: Saturday, April 11, 2009 4:03 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: RangeFilter performance problem using MultiReader
> > >
> > > Ahhh, OK, perhaps that explains the sizable perf difference you're
> > > seeing w/ optimized vs not.  I'm curious to see the results of your
> > > "merge each month into 1 index" test...
> > >
> > > Mike
> > >
> > > On Sat, Apr 11, 2009 at 9:21 AM, Roberto Franchini
> > > <ro.franchini@gmail.com> wrote:
> > > > On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless
> > > > <lucene@mikemccandless.com> wrote:
> > > >> Hmm then I'm a bit baffled again.
> > > >>
> > > >> Because, each of your "by month" indexes presumably has a unique
> > > >> subset of terms for the "date_doc" field?  Meaning, a given "by
> month"
> > > >> index will have all date_doc corresponding to that month, and a
> > > >> different "by month" index would presumably have no overlap in the
> > > >> terms for the date_doc field.
> > > >
> > > > Yes and no :) In this situation:
> > > >
> > > >>> 200901-->index1, index2
> > > >>> 200902-->index3
> > > >>> 200903-->index4,index5,index6
> > > >
> > > > each month does not overlap with each other, but index1 and index2
> > > > overlap, and so index4 with 5 and 6. So there's overlapping inside a
> > > > single month.
> > > > So I want to trie, next week, this one:
> > > >>> 200901-->index12 (merge of 1 and 2)
> > > >>> 200902-->index3
> > > >>> 200903-->index456 (merge of 4,5,6)
> > > >
> > > > This way we avoid overlapping inside a single month. Maybe this can
> > > > help: stay tuned :)
> > > > R.
> > > >
> > > >
> > > > --
> > > > Roberto Franchini
> > > > http://www.celi.it
> > > > http://www.blogmeter.it
> > > > http://www.memesphere.it
> > > > Tel +39-011-6600814
> > > > jabber:ro.franchini@gmail.com
> <jabber%3Aro.franchini@gmail.com>skype:ro.franchini
> > > >
> > > > --------------------------------------------------------------------
> -
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message