lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: field collapsing performance in sharded environment
Date Wed, 20 Nov 2013 04:28:55 GMT
Have a look at https://issues.apache.org/jira/browse/SOLR-5027 +
https://wiki.apache.org/solr/CollapsingQParserPlugin

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Nov 13, 2013 at 2:46 PM, David Anthony Troiano <
dtroiano@basistech.com> wrote:

> Hello,
>
> I'm hitting a performance issue when using field collapsing in a
> distributed Solr setup and I'm wondering if others have seen it and if
> anyone has an idea to work around. it.
>
> I'm using field collapsing to deduplicate documents that have the same near
> duplicate hash value, and deduplicating at query time (as opposed to
> filtering at index time) is a requirement.  I have a sharded setup with 10
> cores (not SolrCloud), each having ~1000 documents each.  Of the 10k docs,
> most have a unique near duplicate hash value, so there are about 10k unique
> values for the field that I'm grouping on.  The grouping parameters that
> I'm using are:
>
> group=true
> group.field=<near dupe hash field>
> group.main=true
>
> I'm attempting distributed queries (&shards=s1,s2,...,s10) where the only
> difference is the absence or presence of these three grouping parameters
> and I'm consistently seeing a marked difference in performance (as a
> representative data point, 200ms latency without grouping and 1600ms with
> grouping).  Interestingly, if I put all 10k docs on the same core and query
> that core independently with and without grouping, I don't see much of a
> latency difference, so the performance degradation seems to exist only in
> the sharded setup.
>
> Is there a known performance issue when field collapsing in a sharded setup
> (perhaps only manifests when the grouping field has many unique values), or
> have other people observed this?  Any ideas for a workaround?  Note that
> docs in my sharded setup can only have the same signature if they're in the
> same shard, so perhaps that can be used to boost perf, though I don't see
> an exposed way to do so.
>
> A follow-on question is whether we're likely to see the same issue if /
> when we move to SolrCloud.
>
> Thanks,
> Dave
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message