lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jichi Guo <jichi...@gmail.com>
Subject Re: How to speed up field collapsing on large number of groups
Date Thu, 14 Jul 2016 03:07:49 GMT
Hi everyone,

  

Is it possible to optimize collapsing on large index through parallelization
without sharding?

  

Or can we conclude that sharding is currently the only approach to
geometrically speedup slow collapsing queries?

  

I tried manually parallelizing CollapsingQParserPlugin by different Lucene
segments. In particular, I added threadpool to IndexSearcher and then
parallelized CollapsingQParserPlugin.CollapsingFieldValueCollector, which I
rewrote to utilize the LeafCollector introduced in Lucene5.

But I am surprised that parallelization made the overall performance worse.

  

Without parallelization, the first a couple of lucene segments took majority
of the collapsing time, and the rest took almost zero time.

After parallelization, all parallelized collapsing on lucene segments would
take some time, and the overall time become longer by about 20%.

  

Thanks!

  
Sent from [Nylas N1](https://link.nylas.com/link/5tkvmhpozan5j5h3lhni487b
/local-
ff801f29-31d8/0?redirect=https%3A%2F%2Fnylas.com%2Fn1%3Fref%3Dn1&r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn),
the extensible, open source mail client.  

![](https://link.nylas.com/open/5tkvmhpozan5j5h3lhni487b/local-
ff801f29-31d8?r=c29sci11c2VyQGx1Y2VuZS5hcGFjaGUub3Jn)

On Jun 28 2016, at 1:08 pm, jichi &lt;jichifly@gmail.com&gt; wrote:  

> Hi everyone,

>

>  
>

>

> I am using Solr 4.10 to index 20 million documents without sharding.

>

> Each document has a groupId field, and there are about 2 million groups.

>

> I found the search with collapsing on groupId significantly slower comparing
to without collapsing, especially when combined with facet queries.

>

>  
>

>

> I am wondering what would be the general approach to speedup field
collapsing by 2~4 times?

>

> Would sharding the index help?

>

> Is it possible to optimize collapsing without sharding?

>

>  
>

>

> The filter parameter for collapsing is like this:

>

>  
>

>

>     q=*:*&amp;fq={!collapse field=groupId max=sum(...a long formula...)}

>

>  
>

> I also put this fq into warmup queries xml to warmup caches. But still, when
q changes and more fq are added, the collapsing search would take about 3~5
seconds. Without collapsing, the search can finish within 2 seconds.

>

>  
>

>

> I am thinking to manually optimize CollapsingQParserPlugin through
parallelization or extra caching.  
>

>

> For example, is it possible to parallelize collapsing collector by different
lucene index segments?

>

>  
>

>

> Thanks!

>

>  
>

>

> \--  
>

>

> jichi  
>

>

>  
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message