accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Synchronized Access to ZooCache Causing Threads to Block
Date Wed, 12 Feb 2014 23:01:13 GMT
Also, for completeness: I filed ACCUMULO-2362 to work on concurrent 
accesses to the same instance in the same JVM.

Also, I misspoke earlier: much of the lock contention comes out of the 
Tables class, not from the Instance. ZooCache keeps a static map of 
instance to ZooCache which are used by a wide breadth of API calls.

On 2/12/14, 3:58 PM, Josh Elser wrote:
> ACCUMULO-1833 was merged into 1.5.1-SNAPSHOT a long time ago. I probably
> never cleaned up the branch after I finished the ticket.
>
> I believe John Vines started looking at using Curator, but I think he
> decided in the end that there wasn't significant gains to be had by
> using it. I'm sure he commented on the ticket he had for it.
>
> On 2/12/14, 3:56 PM, Ariel Valentin wrote:
>> Is the 1833 branch going to be part of 1.5.1?
>> I recall reading somewhere that there was interest in using Curator to
>> ameliorate working with zookeeper. Is that still part of the release
>> roadmap?
>>
>> Thanks,
>> Ariel
>> ---
>> Sent from my mobile device. Please excuse any errors.
>>
>>> On Feb 12, 2014, at 3:13 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>>
>>> Great, that helps. Thanks for the info, Ariel!
>>>
>>> I think this might be an area we want to revisit in later versions of
>>> Accumulo to make the client API implementations a little more robust
>>> and supportive of concurrent usage.
>>>
>>>> On 2/12/14, 3:10 PM, Ariel Valentin wrote:
>>>> Josh,
>>>>
>>>> The symptom is that we hit a point where a single server seems
>>>> "unresponsive" but we do not see anything unusual going on in that
>>>> machine and it seems idol. No heavy CPU, no I/O wait, low load average;
>>>> however when we add additional instances of the JVM our capacity seems
>>>> to increase linearly.
>>>>
>>>> Based on thread dumps and profiler stats it appears that under "heavy"
>>>> load most of our threads are blocked trying to access ZooCache.
>>>>
>>>>
>>>> Ariel Valentin
>>>> e-mail: ariel@arielvalentin.com <mailto:ariel@arielvalentin.com>
>>>> website: http://blog.arielvalentin.com
>>>> skype: ariel.s.valentin
>>>> twitter: arielvalentin
>>>> linkedin: http://www.linkedin.com/profile/view?id=8996534
>>>> ---------------------------------------
>>>> *simplicity *communication
>>>> *feedback *courage *respect
>>>>
>>>>
>>>> On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <josh.elser@gmail.com
>>>> <mailto:josh.elser@gmail.com>> wrote:
>>>>
>>>>     Didn't mean to ask about the subject matter, but how you were using
>>>>     the API. Are you actually seeing contention on ZooCache?
>>>>
>>>>
>>>>     On 2/12/14, 1:19 PM, Ariel Valentin wrote:
>>>>
>>>>         Sorry but I am not at liberty to be specific about our business
>>>>         problem.
>>>>
>>>>         Typical usage is multiple clients writing data to tables, which
>>>>         scan to
>>>>         avoid duplicate entries.
>>>>
>>>>         Ariel Valentin
>>>>         e-mail: ariel@arielvalentin.com
>>>> <mailto:ariel@arielvalentin.com>
>>>>         <mailto:ariel@arielvalentin.__com
>>>> <mailto:ariel@arielvalentin.com>>
>>>>         website: http://blog.arielvalentin.com
>>>>         skype: ariel.s.valentin
>>>>         twitter: arielvalentin
>>>>         linkedin: http://www.linkedin.com/__profile/view?id=8996534
>>>>         <http://www.linkedin.com/profile/view?id=8996534>
>>>>         ------------------------------__---------
>>>>         *simplicity *communication
>>>>         *feedback *courage *respect
>>>>
>>>>
>>>>         On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser
>>>>         <josh.elser@gmail.com <mailto:josh.elser@gmail.com>
>>>>         <mailto:josh.elser@gmail.com <mailto:josh.elser@gmail.com>>>
>>>> wrote:
>>>>
>>>>              Also, I forgot this part before:
>>>>
>>>>              The ZooCache instance that's used *typically* comes
>>>> from the
>>>>              Instance object that your Connector was created from.
>>>> In other
>>>>              words, if you create multiple Instances
>>>> (ZooKeeperInstance,
>>>>              usually), you can get multiple ZooCaches which means that
>>>>         concurrent
>>>>              calls to methods off of those objects should not block one
>>>>         another
>>>>              (createScanner off of connector1 from instance1 should not
>>>>         block
>>>>              createScanner off of connector2 from instance2).
>>>>
>>>>              That should be something quick you can play with if you so
>>>>         desire.
>>>>
>>>>
>>>>              On 2/12/14, 9:57 AM, Josh Elser wrote:
>>>>
>>>>                  Yep, you'll likely also block on BatchScanner,
>>>> anything in
>>>>                  TableOperations, and a host of other things.
>>>>
>>>>                  For scanners, there's likely a standing
>>>> recommendation to
>>>>                  amortize the
>>>>                  use of those objects (if you want to look up 5 range,
>>>>         don't make 5
>>>>                  scanners).
>>>>
>>>>                  Creating a cache per member in the work would likely
>>>>         require
>>>>                  some kind
>>>>                  of paxos implementation to provide consistency
>>>> which is
>>>>         highly
>>>>                  undesirable.
>>>>
>>>>                  One thing I'm curious about is the impact of removing
>>>>         ZooCache
>>>>                  altogether from things like the client api and see
>>>> what
>>>>         happens.
>>>>                  I don't
>>>>                  have a good way to measure that impact off the top of
>>>>         my head
>>>>                  though.
>>>>
>>>>                  Anyways, is this causing you problems in your usage of
>>>>         the api?
>>>>                  Could
>>>>                  you elaborate a bit more on the specifics?
>>>>
>>>>                  On Feb 12, 2014 4:48 AM, "Ariel Valentin"
>>>>                  <ariel@arielvalentin.com
>>>>         <mailto:ariel@arielvalentin.com>
>>>>         <mailto:ariel@arielvalentin.__com
>>>> <mailto:ariel@arielvalentin.com>>
>>>>                  <mailto:ariel@arielvalentin.
>>>>         <mailto:ariel@arielvalentin.>____com
>>>>
>>>>                  <mailto:ariel@arielvalentin.__com
>>>>         <mailto:ariel@arielvalentin.com>>>> wrote:
>>>>
>>>>                       I have run into a problem related to
>>>>         ACCUMULO-1833, which
>>>>                  appears to
>>>>                       have addressed the issue for
>>>>         MutliTableBatchWriter; however
>>>>                  I am
>>>>                       seeing this issue on the scanner side also:
>>>>
>>>>                       394750-"http-/192.168.220.196
>>>> <http://192.168.220.196>
>>>>                  <http://192.168.220.196>:____8080-35" daemon prio=10
>>>>
>>>>                       tid=0x00007f3108038000 nid=0x538a waiting for
>>>>         monitor entry
>>>>                       [0x00007f31287d1000]
>>>>
>>>>                       394878:   java.lang.Thread.State: BLOCKED (on
>>>>         object monitor)
>>>>
>>>>                       394933- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.fate.____zookeeper.ZooCache.____getInstance(ZooCache.java:301)
>>>>
>>>>
>>>>
>>>>                       395012- - waiting to lock <0x00000000fa64f5b8>
(a
>>>>                  java.lang.Class
>>>>                       for
>>>> org.apache.accumulo.fate.____zookeeper.ZooCache)
>>>>
>>>>                       395120- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.core.____client.impl.Tables.____getZooCache(Tables.java:40)
>>>>
>>>>
>>>>                       395196- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.core.____client.impl.Tables.getMap(____Tables.java:44)
>>>>
>>>>
>>>>                       395267- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.core.____client.impl.Tables.____getNameToIdMap(Tables.java:78)
>>>>
>>>>
>>>>                       395346- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.core.____client.impl.Tables.getTableId(____Tables.java:64)
>>>>
>>>>
>>>>                       395421- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____getTableId(ConnectorImpl.java:____75)
>>>>
>>>>
>>>>                       395510- at
>>>>
>>>>
>>>>
>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____createScanner(ConnectorImpl.____java:137)
>>>>
>>>>
>>>>
>>>>                       I have not spent enough time reasoning about the
>>>>         code to
>>>>                  understand
>>>>                       all of the nuances but I am interested in knowing
>>>>         if there
>>>>                  are any
>>>>                       mitigating strategies for dealing with this
>>>> thread
>>>>                  contention e.g.
>>>>                       would creating a cache entry for each member of
>>>>         the Zookeeper
>>>>                       ensemble help relieve the strain? use multiple
>>>>                  classloaders? or is
>>>>                       my only option to spawn multiple JVMs?
>>>>
>>>>                       Thanks,
>>>>
>>>>                       Ariel Valentin
>>>>                       e-mail: ariel@arielvalentin.com
>>>>         <mailto:ariel@arielvalentin.com>
>>>>                  <mailto:ariel@arielvalentin.__com
>>>>         <mailto:ariel@arielvalentin.com>>
>>>>                  <mailto:ariel@arielvalentin.
>>>>         <mailto:ariel@arielvalentin.>____com
>>>>         <mailto:ariel@arielvalentin.__com
>>>> <mailto:ariel@arielvalentin.com>>>
>>>>
>>>>
>>>>                       website: http://blog.arielvalentin.com
>>>>                       skype: ariel.s.valentin
>>>>                       twitter: arielvalentin
>>>>                       linkedin:
>>>>         http://www.linkedin.com/____profile/view?id=8996534
>>>>         <http://www.linkedin.com/__profile/view?id=8996534>
>>>>                  <http://www.linkedin.com/__profile/view?id=8996534
>>>>         <http://www.linkedin.com/profile/view?id=8996534>>
>>>>                       ------------------------------____---------
>>>>
>>>>                       *simplicity *communication
>>>>                       *feedback *courage *respect
>>>>
>>>>
>>>>

Mime
View raw message