lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alice.H.Yang (mis.cnsh04.Newegg) 41493" <Alice.H.Y...@newegg.com>
Subject (Issue) How improve solr group performance
Date Wed, 28 May 2014 10:42:33 GMT
Hi, all
	Does anybody has some advice for me on solr group performance. I have no idea on the group
performance.

To David Smiley
  	I am not responsible for endeca, It's a pity ,I have no comment on endeca.

Best Regards,
Alice Yang
+86-021-51530666*41493
Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)

-----邮件原件-----
发件人: david.w.smiley@gmail.com [mailto:david.w.smiley@gmail.com] 
发送时间: 2014年5月27日 21:29
收件人: solr-user@lucene.apache.org
主题: Re: 答复: (Issue) How improve solr facet performance

Alice,

RE grouping, try Solr 4.8’s new “collapse” qparser w/ “expand"
SearchComponent.  The ref guide has the docs.  It’s usually a faster equivalent approach
to group=true

Do you care to comment further on NewEgg’s apparent switch from Endeca to Solr?  (confirm
true/false and rationale)

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley


On Tue, May 27, 2014 at 4:17 AM, Alice.H.Yang (mis.cnsh04.Newegg) 41493 < Alice.H.Yang@newegg.com>
wrote:

> Hi, Token
>
> 1.
>         I set the 3 fields with hundreds of values uses fc and the 
> rest uses enum, the performance is improved 2 times compared with no 
> parameter, and then I add facet.method=20 , the performance is 
> improved about 4 times compared with no parameter.
>         And I also tried setting 9 facet field to one copyfield, I 
> test the performance, it is improved about 2.5 times compared with no parameter.
>         So, It is improved a lot under your advice, thanks a lot.
> 2.
>         Now I have another performance issue, It's the group performance.
> The number of data is as same as facet performance scenario.
> When the keyword search hits about one million documents, the QTime is 
> about 600ms.(It doesn't query the first time, it's in cache)
>
> Query url:
>
> select?fl=item_catalog&q=default_search:paramter&defType=edismax&rows=
> 50&group=true&group.field=item_group_id&group.ngroups=true&group.sort=
> stock4sort%20desc,final_price%20asc,is_selleritem%20asc&sort=score%20d
> esc,default_sort%20desc
>
> It need Qtime about 600ms.
>
> This query have two parameter:
>                                                 1. fl one field
>                                                 2. group=true, 
> group.ngroups=true
>
> If I set group=false,, the QTime is only 1 ms.
> But I need do group and group.ngroups, How can I improve the group 
> performance under this demand. Do you have some advice for me. I'm 
> looking forward to your reply.
>
> Best Regards,
> Alice Yang
> +86-021-51530666*41493
> Floor 19,KaiKai Plaza,888,Wanhandu Rd,Shanghai(200042)
>
>
> -----邮件原件-----
> 发件人: Toke Eskildsen [mailto:te@statsbiblioteket.dk]
> 发送时间: 2014年5月24日 15:17
> 收件人: solr-user@lucene.apache.org
> 主题: RE: (Issue) How improve solr facet performance
>
> Alice.H.Yang (mis.cnsh04.Newegg) 41493 [Alice.H.Yang@newegg.com] wrote:
> > 1.  I'm sorry, I have made a mistake, the total number of documents 
> > is
> 32 Million, not 320 Million.
> > 2.  The system memory is large for solr index, OS total has 256G, I 
> > set
> the solr tomcat HEAPSIZE="-Xms25G -Xmx100G"
>
> 100G is a very high number. What special requirements dictates such a 
> large heap size?
>
> > Reply:  9 fields I facet on.
>
> Solr treats each facet separately and with facet.method=fc and 10M 
> hits, this means that it will iterate 9*10M = 90M document IDs and 
> update the counters for those.
>
> > Reply:  3 facet fields have one hundred unique values, other 6 facet
> fields' unique values are between 3 to 15.
>
> So very low cardinality. This is confirmed by your low response time 
> of 6ms for 2925 hits.
>
> > And we test this scenario:  If the number of facet fields' unique 
> > values
> is less we add facet.method=enum, there is a little to improve performance.
>
> That is a shame: enum is normally the simple answer to a setup like yours.
> Have you tried fine-tuning your fc/enum selection, so that the 3 
> fields with hundreds of values uses fc and the rest uses enum? That 
> might halve your response time.
>
>
> Since the number of unique facets is so low, I do not think that 
> DocValues can help you here. Besides the fine-grained 
> fc/enum-selection above, you could try collapsing all 9 facet-fields 
> into a single field. The idea behind this is that for facet.method=fc, 
> performing faceting on a field with (for example) 300 unique values 
> takes practically the same amount of time as faceting on a field with 
> 1000 unique values: Faceting on a single slightly larger field is much faster than faceting
on 9 smaller fields.
> After faceting with facet.limit=-1 on the single super-facet-field, 
> you must match the returned values back to their original fields:
>
>
> If you have the facet-fields
>
> field0: 34
> field1: 187
> field2: 78432
> field3: 3
> ...
>
> then collapse them by or-ing a field-specific mask that is bigger than 
> the max in any field, then put it all into a single field:
>
> fieldAll: 0xA0000000 | 34
> fieldAll: 0xA1000000 | 187
> fieldAll: 0xA2000000 | 78432
> fieldAll: 0xA3000000 | 3
> ...
>
> perform the facet request on fieldAll with facet.limit=-1 and split 
> the resulting counts with
>
> for (entry: facetResultAll) {
>   switch (0xFF000000 & entry.value) {
>     case 0xA0000000:
>       field0.add(entry.value, entry.count);
>       break;
>     case 0xA1000000:
>       field1.add(entry.value, entry.count);
>       break;
> ...
>   }
> }
>
>
> Regards,
> Toke Eskildsen, State and University Library, Denmark
>
Mime
View raw message