lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: RangeFilter performance problem using MultiReader
Date Sat, 11 Apr 2009 18:40:51 GMT
Siiiggghhh. So that means I'll have to really look at TrieRange before I
can appear competent<G>..

Thanks
Erick

On Sat, Apr 11, 2009 at 11:23 AM, Uwe Schindler <uwe@thetaphi.de> wrote:

> This is why I invented TrieRange: Full precision dates but less terms
> during
> filtering/searching. With TrieRange on the longs returned bay
> Date.getTime()
> you even have precision of milliseconds without any speed decrease (only
> bigger index size). Or double values with full precision, everything is
> possible :-)
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Erick Erickson [mailto:erickerickson@gmail.com]
> > Sent: Saturday, April 11, 2009 6:42 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: RangeFilter performance problem using MultiReader
> >
> > OK, I scanned all the e-mails in this thread so I may be way off base,
> but
> > has anyone yet asked the basic question of whether the granularity of the
> > dates is really necessary <G>?
> >
> > Raf and Roberto:
> >
> > It appears you're indexing your dates down to second resolution, which
> > is why your number of unique terms is so high. Will it serve your
> use-case
> > to only index down to day? or perhaps hour? That will reduce your number
> > of terms substantially. There is also the possibility of breaking up your
> > dates into two or more fields if you really require the granularity. You
> > could
> > probably run a quick test of this approach just to see how it would
> change
> > your search times before investing too muchtime in the process....
> >
> > But I'm entirely ignorant of the multireader nuances, so this may be
> > completely
> > irrelevant....
> >
> > Best
> > Erick
> >
> >
> > On Sat, Apr 11, 2009 at 7:36 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
> >
> > > In addition to merging each month into one index instead of all in one
> > > index, you could also do some additional optimization when using the
> > Range
> > > filter:
> > > Just combine only those indexes needed to fulfil the range spec during
> > > search. So if somebody want to filter Jan 15 to Feb 15, only create a
> > > MultiReader of the indexes for Jan and Feb, this would speed up the
> > whole
> > > search (also for terms), as the filter would simply remove all
> documents
> > > from the wrong months.
> > >
> > > But the best would be to use TrieRange :)
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > > > -----Original Message-----
> > > > From: Michael McCandless [mailto:lucene@mikemccandless.com]
> > > > Sent: Saturday, April 11, 2009 4:03 PM
> > > > To: java-user@lucene.apache.org
> > > > Subject: Re: RangeFilter performance problem using MultiReader
> > > >
> > > > Ahhh, OK, perhaps that explains the sizable perf difference you're
> > > > seeing w/ optimized vs not.  I'm curious to see the results of your
> > > > "merge each month into 1 index" test...
> > > >
> > > > Mike
> > > >
> > > > On Sat, Apr 11, 2009 at 9:21 AM, Roberto Franchini
> > > > <ro.franchini@gmail.com> wrote:
> > > > > On Sat, Apr 11, 2009 at 1:50 PM, Michael McCandless
> > > > > <lucene@mikemccandless.com> wrote:
> > > > >> Hmm then I'm a bit baffled again.
> > > > >>
> > > > >> Because, each of your "by month" indexes presumably has a unique
> > > > >> subset of terms for the "date_doc" field?  Meaning, a given "by
> > month"
> > > > >> index will have all date_doc corresponding to that month, and
a
> > > > >> different "by month" index would presumably have no overlap in
the
> > > > >> terms for the date_doc field.
> > > > >
> > > > > Yes and no :) In this situation:
> > > > >
> > > > >>> 200901-->index1, index2
> > > > >>> 200902-->index3
> > > > >>> 200903-->index4,index5,index6
> > > > >
> > > > > each month does not overlap with each other, but index1 and index2
> > > > > overlap, and so index4 with 5 and 6. So there's overlapping inside
> a
> > > > > single month.
> > > > > So I want to trie, next week, this one:
> > > > >>> 200901-->index12 (merge of 1 and 2)
> > > > >>> 200902-->index3
> > > > >>> 200903-->index456 (merge of 4,5,6)
> > > > >
> > > > > This way we avoid overlapping inside a single month. Maybe this can
> > > > > help: stay tuned :)
> > > > > R.
> > > > >
> > > > >
> > > > > --
> > > > > Roberto Franchini
> > > > > http://www.celi.it
> > > > > http://www.blogmeter.it
> > > > > http://www.memesphere.it
> > > > > Tel +39-011-6600814
> > > > > jabber:ro.franchini@gmail.com <jabber%3Aro.franchini@gmail.com>
> > <jabber%3Aro.franchini@gmail.com <jabber%253Aro.franchini@gmail.com>
> >skype:ro.franchini
> > > > >
> > > > >
> --------------------------------------------------------------------
> > -
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message