lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-12338) Replay buffering tlog in parallel
Date Thu, 17 May 2018 15:19:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-12338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479197#comment-16479197
] 

David Smiley commented on SOLR-12338:
-------------------------------------

Maybe you can propose {{SetBlockingQueue}} (or whatever name we settle on) to Guava?  Even
if it's not accepted ultimately; there might be some great feedback and/or pointers to something
similar that proves useful, as this stuff is hard so the more eyes the better.

I like that you've avoided hash collisions altogether by not doing hashes!  Use of ConcurrentHashMap<Integer,...>
makes sense to me for such an approach.  However it appears we have some complexity to deal
with since keys need to be added and removed on demand, safely, which seems to be quite tricky.

* I think the "hash" variable should not be called this to avoid confusion as there is no
hashing.  Maybe just "id" or "lockId"
* Do we still need the Random stuff?
* Maybe rename your "SetBlockingQueue" to "SetSemaphore" or probably better "SetLock" as it
does not hold anything (Queues hold stuff)
* Can "Semaphore sizeLock" be renamed to "sizeSemaphore" or "sizePermits" is it does not extend
Lock?
* Can the "closed" state be removed from SetBlockingQueue altogether?  It's not clear it actually
needs to be "closed".  It seems wrong; other concurrent mechanisms don't have this notion
(no Queue, Lock, or Semaphore does, etc.)  FWIW I stripped this from the class and the test
passed.
* Perhaps its better to acquire() the size permit first in add() instead of last to prevent
lots of producing threads inserting keys into a map only to eventually wait.  Although it
might add annoying try-finally to add() to ensure we put the permit back if there's an exception
after (e.g. interrupt).  Heck; maybe that's an issue no matter what the sequence is.
* Can the value side of the ConcurrentHashMap be a Lock (I guess ReentrantLock impl)?  It
seems like the most direct concept we want; Semaphore is more than a Lock as it tracks permits
that we don't need here?
* The hot while loop of map.putIfAbsent seems fishy to me.  Even if it may be rare in practice,
I wonder if we can do something simpler?  You may get luck with map.compute\* methods on ConcurrentHashMap
which execute the lambda atomically.  Though I don't know if it's bad to block if we try to
acquire a lock within there.  I see remove() removes the value of the Map but perhaps it the
value were a mechanism that tracked that there's a producer pending, then we should not remove
the value from the lock?  If we did this, then maybe that would simplify add()?  I'm not sure.

Perhaps a simpler approach would involve involve a Set of weakly referenced objects, and thus
we don't need to worry about removal.  In such a design add() would need to return a reference
to the member of the set, and that object would have a "release()" method when done.  I'm
not sure if in practice these might be GC'ed fast enough if they end up being usually very
temporary?  Shrug.

> Replay buffering tlog in parallel
> ---------------------------------
>
>                 Key: SOLR-12338
>                 URL: https://issues.apache.org/jira/browse/SOLR-12338
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>         Attachments: SOLR-12338.patch, SOLR-12338.patch, SOLR-12338.patch
>
>
> Since updates with different id are independent, therefore it is safe to replay them
in parallel. This will significantly reduce recovering time of replicas in high load indexing
environment. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message