lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shalin Shekhar Mangar <shalinman...@gmail.com>
Subject Re: Date faceting - howto improve performance
Date Wed, 29 Apr 2009 20:13:02 GMT
Some basic documentation is in the example schema.xml. Ask away if you have
specific questions.

On Thu, Apr 30, 2009 at 1:00 AM, Marcus Herou <marcus.herou@tailsweep.com>wrote:

> Aha!
>
> Hmm , googling wont help me I see. any hints of usages ?
>
> /M
>
>
> On Tue, Apr 28, 2009 at 12:29 AM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
>
> > Sorry, I'm late in this thread.
> >
> > Did you try using Trie fields (new in 1.4)? The regular date faceting
> won't
> > work out-of-the-box for trie fields I think. But you could use
> facet.query
> > to achieve the same effect. On my simple benchmarks I've found trie
> fields
> > to give a huge improvement in range searches.
> >
> > On Sat, Apr 25, 2009 at 4:24 PM, Marcus Herou <
> marcus.herou@tailsweep.com
> > >wrote:
> >
> > > Hi.
> > >
> > > One of our faceting use-cases:
> > > We are creating trend graphs of how many blog posts that contains a
> > certain
> > > term and groups it by day/week/year etc. with the nice DateMathParser
> > > functions.
> > >
> > > The performance degrades really fast and consumes a lot of memory which
> > > forces OOM from time to time
> > > We think it is due the fact that the cardinality of the field
> > publishedDate
> > > in our index is huge, almost equal to the nr of documents in the index.
> > >
> > > We need to address that...
> > >
> > > Some questions:
> > >
> > > 1. Can a datefield have other date-formats than the default of
> yyyy-MM-dd
> > > HH:mm:ssZ ?
> > >
> > > 2. We are thinking of adding a field to the index which have the format
> > > yyyy-MM-dd to reduce the cardinality, if that field can't be a date, it
> > > could perhaps be a string, but the question then is if faceting can be
> > used
> > > ?
> > >
> > > 3. Since we now already have such a huge index, is there a way to add a
> > > field afterwards and apply it to all documents without actually
> > reindexing
> > > the whole shebang ?
> > >
> > > 4. If the field cannot be a string can we just leave out the
> > > hour/minute/second information and to reduce the cardinality and
> improve
> > > performance ? Example: 2009-01-01 00:00:00Z
> > >
> > > 5. I am afraid that we need to reindex everything to get this to work
> > > (negates Q3). We have 8 shards as of current, what would the most
> > efficient
> > > way be to reindexing the whole shebang ? Dump the entire database to
> disk
> > > (sigh), create many xml file splits and use curl in a
> > > random/hash(numServers) manner on them ?
> > >
> > >
> > > Kindly
> > >
> > > //Marcus
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Marcus Herou CTO and co-founder Tailsweep AB
> > > +46702561312
> > > marcus.herou@tailsweep.com
> > > http://www.tailsweep.com/
> > > http://blogg.tailsweep.com/
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>
>
>
> --
> Marcus Herou CTO and co-founder Tailsweep AB
> +46702561312
> marcus.herou@tailsweep.com
> http://www.tailsweep.com/
> http://blogg.tailsweep.com/
>



-- 
Regards,
Shalin Shekhar Mangar.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message