lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomás Fernández Löbbe <tomasflo...@gmail.com>
Subject Re: Slow faceting performance on a docValues field
Date Tue, 13 Jan 2015 19:30:56 GMT
"fc", "fcs" and "enum" only apply for field faceting, not range faceting.

Tomás

On Tue, Jan 13, 2015 at 11:24 AM, David Smith <dsmithsolr@yahoo.com.invalid>
wrote:

> What is stumping me is that the search result has 3 hits, yet faceting
> those 3 hits takes 24 seconds.  The documentation for facet.method=fc is
> quite explicit about how Solr does faceting:
>
>
> "fc (stands for Field Cache) The facet counts are calculated by iterating
> over documents that match the query and summing the terms that appear in
> each document. This was the default method for single valued fields prior
> to Solr 1.4."
>
> If a search yielded millions of hits, I could understand 24 seconds to
> calculate the facets.  But not for a search with only 3 hits.
>
>
> What am I missing?
>
> Regards,
> David
>
>
>
>
>
>      On Tuesday, January 13, 2015 1:12 PM, Tomás Fernández Löbbe <
> tomasflobbe@gmail.com> wrote:
>
>
>  No, you are not misreading, right now there is no automatic way of
> generating the intervals on the server side similar to range faceting... I
> guess it won't work in your case. Maybe you should create a Jira to add
> this feature to interval faceting.
>
> Tomás
>
> On Tue, Jan 13, 2015 at 10:44 AM, David Smith <dsmithsolr@yahoo.com.invalid
> >
> wrote:
>
> > Tomás,
> >
> >
> > Thanks for the response -- the performance of my query makes perfect
> sense
> > in light of your information.
> > I looked at Interval faceting.  My required interval is 1 day.  I cannot
> > change that requirement.  Unless I am mis-reading the doc, that means to
> > facet a 10 year range, the query needs to specify over 3,600 intervals ??
> >
> >
> >
> f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]&f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]&etc,etc
> >
> >
> > Each query would be 185MB in size if I structure it this way.
> >
> > I assume I must be mis-understanding how to use Interval faceting with
> > dates.  Are there any concrete examples you know of?  A google search did
> > not come up with much.
> >
> > Kind regards,
> > Dave
> >
> >      On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe <
> > tomasflobbe@gmail.com> wrote:
> >
> >
> >  Range Faceting won't use the DocValues even if they are there set, it
> > translates each gap to a filter. This means that it will end up using the
> > FilterCache, which should cause faster followup queries if you repeat the
> > same gaps (and don't commit).
> > You may also want to try interval faceting, it will use DocValues instead
> > of filters. The API is different, you'll have to provide the intervals
> > yourself.
> >
> > Tomás
> >
> > On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey <apache@elyograg.org>
> > wrote:
> >
> > > On 1/13/2015 10:35 AM, David Smith wrote:
> > > > I have a query against a single 50M doc index (175GB) using Solr
> > 4.10.2,
> > > that exhibits the following response times (via the debugQuery option
> in
> > > Solr Admin):
> > > > "process": {
> > > >  "time": 24709,
> > > >  "query": { "time": 54 }, "facet": { "time": 24574 },
> > > >
> > > >
> > > > The query time of 54ms is great and exactly as expected -- this
> example
> > > was a single-term search that returned 3 hits.
> > > > I am trying to get the facet time (24.5 seconds) to be sub-second,
> and
> > > am having no luck.  The facet part of the query is as follows:
> > > >
> > > > "params": { "facet.range": "eventDate",
> > > >  "f.eventDate.facet.range.end": "2015-05-13T16:37:18.000Z",
> > > >  "f.eventDate.facet.range.gap": "+1DAY",
> > > >  "start": "0",
> > > >
> > > >  "rows": "10",
> > > >
> > > >  "f.eventDate.facet.range.start": "2005-03-13T16:37:18.000Z",
> > > >
> > > >  "f.eventDate.facet.mincount": "1",
> > > >
> > > >  "facet": "true",
> > > >
> > > >  "debugQuery": "true",
> > > >  "_": "1421169383802"
> > > >  }
> > > >
> > > > And, the relevant schema definition is as follows:
> > > >
> > > >    <field name="eventDate" type="tdate" indexed="true" stored="true"
> > > multiValued="false" docValues="true"/>
> > > >
> > > >    <!-- A Trie based date field for faster date range queries and
> date
> > > faceting. -->
> > > >    <fieldType name="tdate" class="solr.TrieDateField"
> precisionStep="6"
> > > positionIncrementGap="0"/>
> > > >
> > > >
> > > > During the 25-second query, the Solr JVM pegs one CPU, with little or
> > no
> > > I/O activity detected on the drive that holds the 175GB index.  I have
> > 48GB
> > > of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM.
> > > >
> > > > I do NOT have any fieldValue caches configured as yet, because my
> > > (perhaps too simplistic?) reading of the documentation was that
> DocValues
> > > eliminates the need for a field-level cache on this facet field.
> > >
> > > 24GB of RAM to cache 175GB is probably not enough in the general case,
> > > but if you're seeing very little disk I/O activity for this query, then
> > > we'll leave that alone and you can worry about it later.
> > >
> > > What I would try immediately is setting the facet.method parameter to
> > > enum and seeing what that does to the facet time.  I've had good luck
> > > generally with that, even in situations where the docs indicated that
> > > the default (fc) was supposed to work better.  I have never explored
> the
> > > relationship between facet.method and docValues, though.
> > >
> > > I'm out of ideas after this.  I don't have enough experience with
> > > faceting to help much.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
> >
> >
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message