lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <>
Subject [jira] [Updated] (SOLR-8744) Overseer operations need more fine grained mutual exclusion
Date Tue, 24 May 2016 14:06:12 GMT


David Smiley updated SOLR-8744:

Thanks for the review [~noble.paul].  I attached a new version that uses List<String>
instead of '/' separated strings.  It's even fewer lines of code :-)

bq. It's not enough to ensure that your parent is not locked. It is also required that none
of your children are locked too. For example, C1 cannot be locked if C1/S1/R1 or C1/S2/R1
is locked

Did you see the javadocs?  I could have added more, and I added a little more.  Or maybe you
are unfamiliar with how ReentrantReadWriteLock works?  See, there are two locks at each node
in the tree -- a read lock and a write lock, courtesy of ReentrantReadWriteLock which does
most of the work here.  Multiple read locks can be open for a ReentrantReadWriteLock but the
write lock is mutually exclusive to them all.  The Lock that SmileyLockTree returns at a given
"node" will first obtain a read lock at all ancestors (top down) and then obtain a write lock
on the node for itself.  If a Lock for {{C1/S1/R1}} is obtained, it will be a LayeredLock
with "thisLock" being the write lock for itself (R1) and the "parentLock" is a chain of LeveledLock
_of read locks_ to its ancestors.  The origin of the chain (i.e. if you keep following LevelledLock
parentLock instances) is the read lock on the root, which is a direct instance of ReentrantReadWriteLock.readLock
whereas all the rest are the anonymous inner class ReadWriteLock that returns a LayeredLock
for it's read & write sides.  So when the caller calls lock(), the first concrete Lock
impl to be locked with be the read Lock on root (empty string), then the read lock on {{C1}}
then the read lock on {{C1/S1}} then the _write lock_ on {{C1/S1/R1}}.  For the duration of
time this is locked, any siblings of R1 or R1's ancestors may be locked, or they could have
been locked before.  What would block would be a an ancestor -- {{C1}} in your example would
block because the write lock blocks for read locks, and the read block is locked by not only
{{C1/S1/R1}} but also {{C1/S2/R1}} in your example.

It's a little functional which maybe you aren't used to, and it leverages useful abstractions
and implementations from the JDK that maybe you are unfamiliar with, so perhaps that's why
it's not clear how this works.  Do you grok this [~dragonsinth]?

> Overseer operations need more fine grained mutual exclusion
> -----------------------------------------------------------
>                 Key: SOLR-8744
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>    Affects Versions: 5.4.1
>            Reporter: Scott Blum
>            Assignee: Noble Paul
>              Labels: sharding, solrcloud
>         Attachments: SOLR-8744.patch, SOLR-8744.patch,,
> SplitShard creates a mutex over the whole collection, but, in practice, this is a big
scaling problem.  Multiple split shard operations could happen at the time time, as long as
different shards are being split.  In practice, those shards often reside on different machines,
so there's no I/O bottleneck in those cases, just the mutex in Overseer forcing the operations
to be done serially.
> Given that a single split can take many minutes on a large collection, this is a bottleneck
at scale.
> Here is the proposed new design
> There are various Collection operations performed at Overseer. They may need exclusive
access at various levels. Each operation must define the Access level at which the access
is required. Access level is an enum. 
> SHARD(2)
> The Overseer node maintains a tree of these locks. The lock tree would look as follows.
The tree can be created lazily as and when tasks come up.
> {code}
> Legend: 
> C1, C2 -> Collections
> S1, S2 -> Shards 
> R1,R2,R3,R4 -> Replicas
>                  Cluster
>                 /       \
>                /         \         
>               C1          C2
>              / \         /   \     
>             /   \       /     \      
>            S1   S2      S1     S2
>         R1, R2  R3.R4  R1,R2   R3,R4
> {code}
> When the overseer receives a message, it tries to acquire the appropriate lock from the
tree. For example, if an operation needs a lock at a Collection level and it needs to operate
on Collection C1, the node C1 and all child nodes of C1 must be free. 
> h2.Lock acquiring logic
> Each operation would start from the root of the tree (Level 0 -> Cluster) and start
moving down depending upon the operation. After it reaches the right node, it checks if all
the children are free from a lock.  If it fails to acquire a lock, it remains in the work
queue. A scheduler thread waits for notification from the current set of tasks . Every task
would do a {{notify()}} on the monitor of  the scheduler thread. The thread would start from
the head of the queue and check all tasks to see if that task is able to acquire the right
lock. If yes, it is executed, if not, the task is left in the work queue.  
> When a new task arrives in the work queue, the schedulerthread wakes and just try to
schedule that task.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message