lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Janmejay Singh (JIRA)" <>
Subject [jira] [Commented] (SOLR-8973) TX-frenzy on Zookeeper when collection is put to use
Date Thu, 14 Apr 2016 17:43:25 GMT


Janmejay Singh commented on SOLR-8973:

No, there is a difference in what overseer and core-api (on a different node) see at the same
instant. Some ZK nodes may be lagging (ZK does not ensure visibility of changes across all
nodes at the same time), when clients can't tolerate delay in visibility of changes, they
need to execute sync operation before read.

Overseer's session may be connected to a zk-node that is ahead of the zk-node that the core-node
is connected to. So while overseer sees the change, core-node will not (unless it executes
sync before read).

If all nodes saw the same version as overseer, the race wouldn't exist at all.

We can change the patch to lazily setup watch for a collection that is fetched using active(on-demand)
fetcher. In this model, once the fetch is done successfully, it will setup watch for the collection
before returning the fetched collection-def.

> TX-frenzy on Zookeeper when collection is put to use
> ----------------------------------------------------
>                 Key: SOLR-8973
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.0, 5.1, 5.2, 5.3, 5.4, 5.5, master, 5.6
>            Reporter: Janmejay Singh
>            Assignee: Shalin Shekhar Mangar
>              Labels: collections, patch-available, solrcloud, zookeeper
>         Attachments: SOLR-8973.patch
> This is to do with a distributed data-race. Core-creation happens at a time when collection
is not yet visible to the node. In this case a fallback code-path is used which de-references
collection-state lazily (on demand) as opposed to setting a watch and keeping it cached locally.
> Due to this, as requests towards the core mount, it generates ZK fetch for collection
proportionately. On a large solr-cloud cluster, this generates several Gbps of TX traffic
on ZK nodes. This affects indexing throughput(which floors) in addition to running ZK node
out of network bandwidth. 
> On smaller solr-cloud clusters its hard to run into, because probability of this race
materializing reduces.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message