lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lemke, Michael SZ/HZA-ZSW" <lemke...@schaeffler.com>
Subject RE: Facet performance
Date Fri, 18 Oct 2013 16:30:20 GMT
Toke Eskildsen [mailto:te@statsbiblioteket.dk] wrote:
>Lemke, Michael  SZ/HZA-ZSW [lemkemch@schaeffler.com] wrote:
>> 1. q=word&facet.field=CONTENT&facet=true&facet.prefix=&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0
>> 2. q=word&facet.field=CONTENT&facet=true&facet.prefix=a&facet.limit=10&facet.mincount=1&facet.method=enum&rows=0
>
>> The only difference is am empty facet.prefix in the first query.
>
>> The first query returns after some 20 seconds (QTime 20000 in the result) while
>> the second one takes only 80 msec (QTime 80). Why is this?
>
>If you index was just opened when you issued your queries, the first request will be notably
slower than the second as the facet values might not be in 
the disk cache.

I know but it shouldn't be orders of magnitudes as in this example, should it?

>
>Furthermore, for enum the difference between no prefix and some prefix is huge. As enum
iterates values first (as opposed to fc that iterates hits first), limiting to only the values
that starts with 'a' ought to speed up retrieval by a factor 10 or more.

Thanks.  That is what we sort of figured but it's good to know for sure.  Of course it begs
the question if there is a way to speed this up?

>
>> And as side note: facet.method=fc makes the queries run 'forever' and eventually
>> fail with org.apache.solr.common.SolrException: Too many values for UnInvertedField
faceting on field CONTENT.
>
>An internal memory structure optimization in Solr limits the amount of possible unique
values when using fc. It is not a bug as such, but more a consequence of a choice. Unfortunately
the enum-solution is normally quite slow when there are enough unique values to trigger the
"too many values"-exception. I know too little about the structures for DocValues to say if
they will help here, but you might want to take a look at those.

What is DocValues?  Haven't heard of it yet.  And yes, the fc method was terribly slow in
a case where it did work.  Something like 20 minutes whereas enum returned within a few seconds.

Michael


Mime
View raw message