lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Blum (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-8973) TX-frenzy on Zookeeper when collection is put to use
Date Thu, 14 Apr 2016 21:16:25 GMT

    [ https://issues.apache.org/jira/browse/SOLR-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15241923#comment-15241923
] 

Scott Blum edited comment on SOLR-8973 at 4/14/16 9:16 PM:
-----------------------------------------------------------

[~shalinmangar] I've come to the conclusion that ZkStateReader isn't doing as well as it could
be.  Adding watchers in constructState() seems (retroactively) like a hack.  It doesn't correctly
cover the case where a collection parent node exists (e.g. /solr/collections/coll1) but no
state.json child yet appears.

I believe I have a patch and test to fix this.  Attached it to this JIRA, but not sure if
I should create a new one.


was (Author: dragonsinth):
[~shalinmangar] I've come to the conclusion that ZkStateReader isn't doing as well as it could
be.  Adding watchers in constructState() seems (retroactively) like a hack.  It doesn't correctly
cover the case where a collection parent node exists (e.g. /solr/collections/coll1) but no
state.json child yet appears.

I believe I have a patch and test to fix this.  Not sure whether I should attach to this JIRA
or create a new one.

> TX-frenzy on Zookeeper when collection is put to use
> ----------------------------------------------------
>
>                 Key: SOLR-8973
>                 URL: https://issues.apache.org/jira/browse/SOLR-8973
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, master, 5.6
>            Reporter: Janmejay Singh
>            Assignee: Shalin Shekhar Mangar
>              Labels: collections, patch-available, solrcloud, zookeeper
>         Attachments: SOLR-8973-ZkStateReader.patch, SOLR-8973.patch
>
>
> This is to do with a distributed data-race. Core-creation happens at a time when collection
is not yet visible to the node. In this case a fallback code-path is used which de-references
collection-state lazily (on demand) as opposed to setting a watch and keeping it cached locally.
> Due to this, as requests towards the core mount, it generates ZK fetch for collection
proportionately. On a large solr-cloud cluster, this generates several Gbps of TX traffic
on ZK nodes. This affects indexing throughput(which floors) in addition to running ZK node
out of network bandwidth. 
> On smaller solr-cloud clusters its hard to run into, because probability of this race
materializing reduces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message