accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ariel Valentin <ar...@arielvalentin.com>
Subject Re: Synchronized Access to ZooCache Causing Threads to Block
Date Thu, 13 Feb 2014 01:44:15 GMT
Josh,

We experimented with 1.5.1 today; our load test numbers seem to indicate a 10x performance
improvement over 1.5.0 on a single JVM. We are running additional experiments over the next
few days to see what happens when we move to multiple JVMs. Stay tuned.

Thanks,
Ariel
---
Sent from my mobile device. Please excuse any errors.

> On Feb 12, 2014, at 6:01 PM, Josh Elser <josh.elser@gmail.com> wrote:
> 
> Also, for completeness: I filed ACCUMULO-2362 to work on concurrent accesses to the same
instance in the same JVM.
> 
> Also, I misspoke earlier: much of the lock contention comes out of the Tables class,
not from the Instance. ZooCache keeps a static map of instance to ZooCache which are used
by a wide breadth of API calls.
> 
>> On 2/12/14, 3:58 PM, Josh Elser wrote:
>> ACCUMULO-1833 was merged into 1.5.1-SNAPSHOT a long time ago. I probably
>> never cleaned up the branch after I finished the ticket.
>> 
>> I believe John Vines started looking at using Curator, but I think he
>> decided in the end that there wasn't significant gains to be had by
>> using it. I'm sure he commented on the ticket he had for it.
>> 
>>> On 2/12/14, 3:56 PM, Ariel Valentin wrote:
>>> Is the 1833 branch going to be part of 1.5.1?
>>> I recall reading somewhere that there was interest in using Curator to
>>> ameliorate working with zookeeper. Is that still part of the release
>>> roadmap?
>>> 
>>> Thanks,
>>> Ariel
>>> ---
>>> Sent from my mobile device. Please excuse any errors.
>>> 
>>>> On Feb 12, 2014, at 3:13 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>>> 
>>>> Great, that helps. Thanks for the info, Ariel!
>>>> 
>>>> I think this might be an area we want to revisit in later versions of
>>>> Accumulo to make the client API implementations a little more robust
>>>> and supportive of concurrent usage.
>>>> 
>>>>> On 2/12/14, 3:10 PM, Ariel Valentin wrote:
>>>>> Josh,
>>>>> 
>>>>> The symptom is that we hit a point where a single server seems
>>>>> "unresponsive" but we do not see anything unusual going on in that
>>>>> machine and it seems idol. No heavy CPU, no I/O wait, low load average;
>>>>> however when we add additional instances of the JVM our capacity seems
>>>>> to increase linearly.
>>>>> 
>>>>> Based on thread dumps and profiler stats it appears that under "heavy"
>>>>> load most of our threads are blocked trying to access ZooCache.
>>>>> 
>>>>> 
>>>>> Ariel Valentin
>>>>> e-mail: ariel@arielvalentin.com <mailto:ariel@arielvalentin.com>
>>>>> website: http://blog.arielvalentin.com
>>>>> skype: ariel.s.valentin
>>>>> twitter: arielvalentin
>>>>> linkedin: http://www.linkedin.com/profile/view?id=8996534
>>>>> ---------------------------------------
>>>>> *simplicity *communication
>>>>> *feedback *courage *respect
>>>>> 
>>>>> 
>>>>> On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <josh.elser@gmail.com
>>>>> <mailto:josh.elser@gmail.com>> wrote:
>>>>> 
>>>>>    Didn't mean to ask about the subject matter, but how you were using
>>>>>    the API. Are you actually seeing contention on ZooCache?
>>>>> 
>>>>> 
>>>>>    On 2/12/14, 1:19 PM, Ariel Valentin wrote:
>>>>> 
>>>>>        Sorry but I am not at liberty to be specific about our business
>>>>>        problem.
>>>>> 
>>>>>        Typical usage is multiple clients writing data to tables, which
>>>>>        scan to
>>>>>        avoid duplicate entries.
>>>>> 
>>>>>        Ariel Valentin
>>>>>        e-mail: ariel@arielvalentin.com
>>>>> <mailto:ariel@arielvalentin.com>
>>>>>        <mailto:ariel@arielvalentin.__com
>>>>> <mailto:ariel@arielvalentin.com>>
>>>>>        website: http://blog.arielvalentin.com
>>>>>        skype: ariel.s.valentin
>>>>>        twitter: arielvalentin
>>>>>        linkedin: http://www.linkedin.com/__profile/view?id=8996534
>>>>>        <http://www.linkedin.com/profile/view?id=8996534>
>>>>>        ------------------------------__---------
>>>>>        *simplicity *communication
>>>>>        *feedback *courage *respect
>>>>> 
>>>>> 
>>>>>        On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser
>>>>>        <josh.elser@gmail.com <mailto:josh.elser@gmail.com>
>>>>>        <mailto:josh.elser@gmail.com <mailto:josh.elser@gmail.com>>>
>>>>> wrote:
>>>>> 
>>>>>             Also, I forgot this part before:
>>>>> 
>>>>>             The ZooCache instance that's used *typically* comes
>>>>> from the
>>>>>             Instance object that your Connector was created from.
>>>>> In other
>>>>>             words, if you create multiple Instances
>>>>> (ZooKeeperInstance,
>>>>>             usually), you can get multiple ZooCaches which means that
>>>>>        concurrent
>>>>>             calls to methods off of those objects should not block one
>>>>>        another
>>>>>             (createScanner off of connector1 from instance1 should not
>>>>>        block
>>>>>             createScanner off of connector2 from instance2).
>>>>> 
>>>>>             That should be something quick you can play with if you so
>>>>>        desire.
>>>>> 
>>>>> 
>>>>>             On 2/12/14, 9:57 AM, Josh Elser wrote:
>>>>> 
>>>>>                 Yep, you'll likely also block on BatchScanner,
>>>>> anything in
>>>>>                 TableOperations, and a host of other things.
>>>>> 
>>>>>                 For scanners, there's likely a standing
>>>>> recommendation to
>>>>>                 amortize the
>>>>>                 use of those objects (if you want to look up 5 range,
>>>>>        don't make 5
>>>>>                 scanners).
>>>>> 
>>>>>                 Creating a cache per member in the work would likely
>>>>>        require
>>>>>                 some kind
>>>>>                 of paxos implementation to provide consistency
>>>>> which is
>>>>>        highly
>>>>>                 undesirable.
>>>>> 
>>>>>                 One thing I'm curious about is the impact of removing
>>>>>        ZooCache
>>>>>                 altogether from things like the client api and see
>>>>> what
>>>>>        happens.
>>>>>                 I don't
>>>>>                 have a good way to measure that impact off the top of
>>>>>        my head
>>>>>                 though.
>>>>> 
>>>>>                 Anyways, is this causing you problems in your usage of
>>>>>        the api?
>>>>>                 Could
>>>>>                 you elaborate a bit more on the specifics?
>>>>> 
>>>>>                 On Feb 12, 2014 4:48 AM, "Ariel Valentin"
>>>>>                 <ariel@arielvalentin.com
>>>>>        <mailto:ariel@arielvalentin.com>
>>>>>        <mailto:ariel@arielvalentin.__com
>>>>> <mailto:ariel@arielvalentin.com>>
>>>>>                 <mailto:ariel@arielvalentin.
>>>>>        <mailto:ariel@arielvalentin.>____com
>>>>> 
>>>>>                 <mailto:ariel@arielvalentin.__com
>>>>>        <mailto:ariel@arielvalentin.com>>>> wrote:
>>>>> 
>>>>>                      I have run into a problem related to
>>>>>        ACCUMULO-1833, which
>>>>>                 appears to
>>>>>                      have addressed the issue for
>>>>>        MutliTableBatchWriter; however
>>>>>                 I am
>>>>>                      seeing this issue on the scanner side also:
>>>>> 
>>>>>                      394750-"http-/192.168.220.196
>>>>> <http://192.168.220.196>
>>>>>                 <http://192.168.220.196>:____8080-35" daemon prio=10
>>>>> 
>>>>>                      tid=0x00007f3108038000 nid=0x538a waiting for
>>>>>        monitor entry
>>>>>                      [0x00007f31287d1000]
>>>>> 
>>>>>                      394878:   java.lang.Thread.State: BLOCKED (on
>>>>>        object monitor)
>>>>> 
>>>>>                      394933- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.fate.____zookeeper.ZooCache.____getInstance(ZooCache.java:301)
>>>>> 
>>>>> 
>>>>> 
>>>>>                      395012- - waiting to lock <0x00000000fa64f5b8>
(a
>>>>>                 java.lang.Class
>>>>>                      for
>>>>> org.apache.accumulo.fate.____zookeeper.ZooCache)
>>>>> 
>>>>>                      395120- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.core.____client.impl.Tables.____getZooCache(Tables.java:40)
>>>>> 
>>>>> 
>>>>>                      395196- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.core.____client.impl.Tables.getMap(____Tables.java:44)
>>>>> 
>>>>> 
>>>>>                      395267- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.core.____client.impl.Tables.____getNameToIdMap(Tables.java:78)
>>>>> 
>>>>> 
>>>>>                      395346- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.core.____client.impl.Tables.getTableId(____Tables.java:64)
>>>>> 
>>>>> 
>>>>>                      395421- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____getTableId(ConnectorImpl.java:____75)
>>>>> 
>>>>> 
>>>>>                      395510- at
>>>>> 
>>>>> 
>>>>> 
>>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____createScanner(ConnectorImpl.____java:137)
>>>>> 
>>>>> 
>>>>> 
>>>>>                      I have not spent enough time reasoning about the
>>>>>        code to
>>>>>                 understand
>>>>>                      all of the nuances but I am interested in knowing
>>>>>        if there
>>>>>                 are any
>>>>>                      mitigating strategies for dealing with this
>>>>> thread
>>>>>                 contention e.g.
>>>>>                      would creating a cache entry for each member of
>>>>>        the Zookeeper
>>>>>                      ensemble help relieve the strain? use multiple
>>>>>                 classloaders? or is
>>>>>                      my only option to spawn multiple JVMs?
>>>>> 
>>>>>                      Thanks,
>>>>> 
>>>>>                      Ariel Valentin
>>>>>                      e-mail: ariel@arielvalentin.com
>>>>>        <mailto:ariel@arielvalentin.com>
>>>>>                 <mailto:ariel@arielvalentin.__com
>>>>>        <mailto:ariel@arielvalentin.com>>
>>>>>                 <mailto:ariel@arielvalentin.
>>>>>        <mailto:ariel@arielvalentin.>____com
>>>>>        <mailto:ariel@arielvalentin.__com
>>>>> <mailto:ariel@arielvalentin.com>>>
>>>>> 
>>>>> 
>>>>>                      website: http://blog.arielvalentin.com
>>>>>                      skype: ariel.s.valentin
>>>>>                      twitter: arielvalentin
>>>>>                      linkedin:
>>>>>        http://www.linkedin.com/____profile/view?id=8996534
>>>>>        <http://www.linkedin.com/__profile/view?id=8996534>
>>>>>                 <http://www.linkedin.com/__profile/view?id=8996534
>>>>>        <http://www.linkedin.com/profile/view?id=8996534>>
>>>>>                      ------------------------------____---------
>>>>> 
>>>>>                      *simplicity *communication
>>>>>                      *feedback *courage *respect
>>>>> 
>>>>> 
>>>>> 

Mime
View raw message