lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cao Manh Dat (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
Date Mon, 28 May 2018 08:38:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16492410#comment-16492410
] 

Cao Manh Dat commented on SOLR-12338:
-------------------------------------

Thanks a lot for your review [~dsmiley], I was too busy recently.
{quote}
- I think the "hash" variable should not be called this to avoid confusion as there is no
hashing. Maybe just "id" or "lockId"
- Do we still need the Random stuff?
- Maybe rename your "SetBlockingQueue" to "SetSemaphore" or probably better "SetLock" as it
does not hold anything (Queues hold stuff)
- Can "Semaphore sizeLock" be renamed to "sizeSemaphore" or "sizePermits" is it does not extend
Lock?
- Can the "closed" state be removed from SetBlockingQueue altogether? It's not clear it actually
needs to be "closed". It seems wrong; other concurrent mechanisms don't have this notion (no
Queue, Lock, or Semaphore does, etc.) FWIW I stripped this from the class and the test passed.
{quote}
+1

{quote}
Perhaps its better to acquire() the size permit first in add() instead of last to prevent
lots of producing threads inserting keys into a map only to eventually wait. Although it might
add annoying try-finally to add() to ensure we put the permit back if there's an exception
after (e.g. interrupt). Heck; maybe that's an issue no matter what the sequence is.
{quote}
I don't think we should do that. {{sizeLock}} kinda like the number of maximum threads, if
we reached that number, it seems better to let them wait before trying to enqueue more tasks.

{quote}
Can the value side of the ConcurrentHashMap be a Lock (I guess ReentrantLock impl)? It seems
like the most direct concept we want; Semaphore is more than a Lock as it tracks permits that
we don't need here?
{quote}
We can't. Lock or ReetrantLock only allows us to lock and unlock in the same thread. In the
OrderedExecutor, we lock first then unlock in the thread of delegate executor.

{quote}
The hot while loop of map.putIfAbsent seems fishy to me. Even if it may be rare in practice,
I wonder if we can do something simpler? You may get luck with map.compute* methods on ConcurrentHashMap
which execute the lambda atomically. Though I don't know if it's bad to block if we try to
acquire a lock within there. I see remove() removes the value of the Map but perhaps it the
value were a mechanism that tracked that there's a producer pending, then we should not remove
the value from the lock? If we did this, then maybe that would simplify add()? I'm not sure.
{quote}
I will think more about this.

> Replay buffering tlog in parallel
> ---------------------------------
>
>                 Key: SOLR-12338
>                 URL: https://issues.apache.org/jira/browse/SOLR-12338
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>         Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to replay them
in parallel. This will significantly reduce recovering time of replicas in high load indexing
environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message