lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Facet performance
Date Fri, 18 Oct 2013 19:28:24 GMT
DocValues is the new black
http://wiki.apache.org/solr/DocValues

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
SOLR Performance Monitoring -- http://sematext.com/spm



On Fri, Oct 18, 2013 at 12:30 PM, Lemke, Michael  SZ/HZA-ZSW
<lemkemch@schaeffler.com> wrote:
> Toke Eskildsen [mailto:te@statsbiblioteket.dk] wrote:
>>Lemke, Michael  SZ/HZA-ZSW [lemkemch@schaeffler.com] wrote:
>>> 1. q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0
>>> 2. q=word&facet.field=CONTENT&facet=true&facet.prefix=a&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0
>>
>>> The only difference is am empty facet.prefix in the first query.
>>
>>> The first query returns after some 20 seconds (QTime 20000 in the result) while
>>> the second one takes only 80 msec (QTime 80). Why is this?
>>
>>If you index was just opened when you issued your queries, the first request will
be notably slower than the second as the facet values might not be in
> the disk cache.
>
> I know but it shouldn't be orders of magnitudes as in this example, should it?
>
>>
>>Furthermore, for enum the difference between no prefix and some prefix is huge. As
enum iterates values first (as opposed to fc that iterates hits first), limiting to only the
values that starts with 'a' ought to speed up retrieval by a factor 10 or more.
>
> Thanks.  That is what we sort of figured but it's good to know for sure.  Of course it
begs the question if there is a way to speed this up?
>
>>
>>> And as side note: facet.method=fc makes the queries run 'forever' and eventually
>>> fail with org.apache.solr.common.SolrException: Too many values for UnInvertedField
faceting on field CONTENT.
>>
>>An internal memory structure optimization in Solr limits the amount of possible unique
values when using fc. It is not a bug as such, but more a consequence of a choice. Unfortunately
the enum-solution is normally quite slow when there are enough unique values to trigger the
"too many values"-exception. I know too little about the structures for DocValues to say if
they will help here, but you might want to take a look at those.
>
> What is DocValues?  Haven't heard of it yet.  And yes, the fc method was terribly slow
in a case where it did work.  Something like 20 minutes whereas enum returned within a few
seconds.
>
> Michael
>

Mime
View raw message