lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cao Manh Dat (JIRA)" <>
Subject [jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
Date Mon, 28 May 2018 08:38:00 GMT


Cao Manh Dat commented on SOLR-12338:

Thanks a lot for your review [~dsmiley], I was too busy recently.
- I think the "hash" variable should not be called this to avoid confusion as there is no
hashing. Maybe just "id" or "lockId"
- Do we still need the Random stuff?
- Maybe rename your "SetBlockingQueue" to "SetSemaphore" or probably better "SetLock" as it
does not hold anything (Queues hold stuff)
- Can "Semaphore sizeLock" be renamed to "sizeSemaphore" or "sizePermits" is it does not extend
- Can the "closed" state be removed from SetBlockingQueue altogether? It's not clear it actually
needs to be "closed". It seems wrong; other concurrent mechanisms don't have this notion (no
Queue, Lock, or Semaphore does, etc.) FWIW I stripped this from the class and the test passed.

Perhaps its better to acquire() the size permit first in add() instead of last to prevent
lots of producing threads inserting keys into a map only to eventually wait. Although it might
add annoying try-finally to add() to ensure we put the permit back if there's an exception
after (e.g. interrupt). Heck; maybe that's an issue no matter what the sequence is.
I don't think we should do that. {{sizeLock}} kinda like the number of maximum threads, if
we reached that number, it seems better to let them wait before trying to enqueue more tasks.

Can the value side of the ConcurrentHashMap be a Lock (I guess ReentrantLock impl)? It seems
like the most direct concept we want; Semaphore is more than a Lock as it tracks permits that
we don't need here?
We can't. Lock or ReetrantLock only allows us to lock and unlock in the same thread. In the
OrderedExecutor, we lock first then unlock in the thread of delegate executor.

The hot while loop of map.putIfAbsent seems fishy to me. Even if it may be rare in practice,
I wonder if we can do something simpler? You may get luck with map.compute* methods on ConcurrentHashMap
which execute the lambda atomically. Though I don't know if it's bad to block if we try to
acquire a lock within there. I see remove() removes the value of the Map but perhaps it the
value were a mechanism that tracked that there's a producer pending, then we should not remove
the value from the lock? If we did this, then maybe that would simplify add()? I'm not sure.
I will think more about this.

> Replay buffering tlog in parallel
> ---------------------------------
>                 Key: SOLR-12338
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>         Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
> Since updates with different id are independent, therefore it is safe to replay them
in parallel. This will significantly reduce recovering time of replicas in high load indexing

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message