accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Synchronized Access to ZooCache Causing Threads to Block
Date Thu, 13 Feb 2014 04:39:30 GMT
Sick! Thanks for sharing -- feedback is always welcome and appreciated.

On 2/12/14, 8:44 PM, Ariel Valentin wrote:
> Josh,
>
> We experimented with 1.5.1 today; our load test numbers seem to indicate a 10x performance
improvement over 1.5.0 on a single JVM. We are running additional experiments over the next
few days to see what happens when we move to multiple JVMs. Stay tuned.
>
> Thanks,
> Ariel
> ---
> Sent from my mobile device. Please excuse any errors.
>
>> On Feb 12, 2014, at 6:01 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>
>> Also, for completeness: I filed ACCUMULO-2362 to work on concurrent accesses to the
same instance in the same JVM.
>>
>> Also, I misspoke earlier: much of the lock contention comes out of the Tables class,
not from the Instance. ZooCache keeps a static map of instance to ZooCache which are used
by a wide breadth of API calls.
>>
>>> On 2/12/14, 3:58 PM, Josh Elser wrote:
>>> ACCUMULO-1833 was merged into 1.5.1-SNAPSHOT a long time ago. I probably
>>> never cleaned up the branch after I finished the ticket.
>>>
>>> I believe John Vines started looking at using Curator, but I think he
>>> decided in the end that there wasn't significant gains to be had by
>>> using it. I'm sure he commented on the ticket he had for it.
>>>
>>>> On 2/12/14, 3:56 PM, Ariel Valentin wrote:
>>>> Is the 1833 branch going to be part of 1.5.1?
>>>> I recall reading somewhere that there was interest in using Curator to
>>>> ameliorate working with zookeeper. Is that still part of the release
>>>> roadmap?
>>>>
>>>> Thanks,
>>>> Ariel
>>>> ---
>>>> Sent from my mobile device. Please excuse any errors.
>>>>
>>>>> On Feb 12, 2014, at 3:13 PM, Josh Elser <josh.elser@gmail.com>
wrote:
>>>>>
>>>>> Great, that helps. Thanks for the info, Ariel!
>>>>>
>>>>> I think this might be an area we want to revisit in later versions of
>>>>> Accumulo to make the client API implementations a little more robust
>>>>> and supportive of concurrent usage.
>>>>>
>>>>>> On 2/12/14, 3:10 PM, Ariel Valentin wrote:
>>>>>> Josh,
>>>>>>
>>>>>> The symptom is that we hit a point where a single server seems
>>>>>> "unresponsive" but we do not see anything unusual going on in that
>>>>>> machine and it seems idol. No heavy CPU, no I/O wait, low load average;
>>>>>> however when we add additional instances of the JVM our capacity
seems
>>>>>> to increase linearly.
>>>>>>
>>>>>> Based on thread dumps and profiler stats it appears that under "heavy"
>>>>>> load most of our threads are blocked trying to access ZooCache.
>>>>>>
>>>>>>
>>>>>> Ariel Valentin
>>>>>> e-mail: ariel@arielvalentin.com <mailto:ariel@arielvalentin.com>
>>>>>> website: http://blog.arielvalentin.com
>>>>>> skype: ariel.s.valentin
>>>>>> twitter: arielvalentin
>>>>>> linkedin: http://www.linkedin.com/profile/view?id=8996534
>>>>>> ---------------------------------------
>>>>>> *simplicity *communication
>>>>>> *feedback *courage *respect
>>>>>>
>>>>>>
>>>>>> On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <josh.elser@gmail.com
>>>>>> <mailto:josh.elser@gmail.com>> wrote:
>>>>>>
>>>>>>     Didn't mean to ask about the subject matter, but how you were
using
>>>>>>     the API. Are you actually seeing contention on ZooCache?
>>>>>>
>>>>>>
>>>>>>     On 2/12/14, 1:19 PM, Ariel Valentin wrote:
>>>>>>
>>>>>>         Sorry but I am not at liberty to be specific about our business
>>>>>>         problem.
>>>>>>
>>>>>>         Typical usage is multiple clients writing data to tables,
which
>>>>>>         scan to
>>>>>>         avoid duplicate entries.
>>>>>>
>>>>>>         Ariel Valentin
>>>>>>         e-mail: ariel@arielvalentin.com
>>>>>> <mailto:ariel@arielvalentin.com>
>>>>>>         <mailto:ariel@arielvalentin.__com
>>>>>> <mailto:ariel@arielvalentin.com>>
>>>>>>         website: http://blog.arielvalentin.com
>>>>>>         skype: ariel.s.valentin
>>>>>>         twitter: arielvalentin
>>>>>>         linkedin: http://www.linkedin.com/__profile/view?id=8996534
>>>>>>         <http://www.linkedin.com/profile/view?id=8996534>
>>>>>>         ------------------------------__---------
>>>>>>         *simplicity *communication
>>>>>>         *feedback *courage *respect
>>>>>>
>>>>>>
>>>>>>         On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser
>>>>>>         <josh.elser@gmail.com <mailto:josh.elser@gmail.com>
>>>>>>         <mailto:josh.elser@gmail.com <mailto:josh.elser@gmail.com>>>
>>>>>> wrote:
>>>>>>
>>>>>>              Also, I forgot this part before:
>>>>>>
>>>>>>              The ZooCache instance that's used *typically* comes
>>>>>> from the
>>>>>>              Instance object that your Connector was created from.
>>>>>> In other
>>>>>>              words, if you create multiple Instances
>>>>>> (ZooKeeperInstance,
>>>>>>              usually), you can get multiple ZooCaches which means
that
>>>>>>         concurrent
>>>>>>              calls to methods off of those objects should not block
one
>>>>>>         another
>>>>>>              (createScanner off of connector1 from instance1 should
not
>>>>>>         block
>>>>>>              createScanner off of connector2 from instance2).
>>>>>>
>>>>>>              That should be something quick you can play with if
you so
>>>>>>         desire.
>>>>>>
>>>>>>
>>>>>>              On 2/12/14, 9:57 AM, Josh Elser wrote:
>>>>>>
>>>>>>                  Yep, you'll likely also block on BatchScanner,
>>>>>> anything in
>>>>>>                  TableOperations, and a host of other things.
>>>>>>
>>>>>>                  For scanners, there's likely a standing
>>>>>> recommendation to
>>>>>>                  amortize the
>>>>>>                  use of those objects (if you want to look up 5 range,
>>>>>>         don't make 5
>>>>>>                  scanners).
>>>>>>
>>>>>>                  Creating a cache per member in the work would likely
>>>>>>         require
>>>>>>                  some kind
>>>>>>                  of paxos implementation to provide consistency
>>>>>> which is
>>>>>>         highly
>>>>>>                  undesirable.
>>>>>>
>>>>>>                  One thing I'm curious about is the impact of removing
>>>>>>         ZooCache
>>>>>>                  altogether from things like the client api and see
>>>>>> what
>>>>>>         happens.
>>>>>>                  I don't
>>>>>>                  have a good way to measure that impact off the top
of
>>>>>>         my head
>>>>>>                  though.
>>>>>>
>>>>>>                  Anyways, is this causing you problems in your usage
of
>>>>>>         the api?
>>>>>>                  Could
>>>>>>                  you elaborate a bit more on the specifics?
>>>>>>
>>>>>>                  On Feb 12, 2014 4:48 AM, "Ariel Valentin"
>>>>>>                  <ariel@arielvalentin.com
>>>>>>         <mailto:ariel@arielvalentin.com>
>>>>>>         <mailto:ariel@arielvalentin.__com
>>>>>> <mailto:ariel@arielvalentin.com>>
>>>>>>                  <mailto:ariel@arielvalentin.
>>>>>>         <mailto:ariel@arielvalentin.>____com
>>>>>>
>>>>>>                  <mailto:ariel@arielvalentin.__com
>>>>>>         <mailto:ariel@arielvalentin.com>>>> wrote:
>>>>>>
>>>>>>                       I have run into a problem related to
>>>>>>         ACCUMULO-1833, which
>>>>>>                  appears to
>>>>>>                       have addressed the issue for
>>>>>>         MutliTableBatchWriter; however
>>>>>>                  I am
>>>>>>                       seeing this issue on the scanner side also:
>>>>>>
>>>>>>                       394750-"http-/192.168.220.196
>>>>>> <http://192.168.220.196>
>>>>>>                  <http://192.168.220.196>:____8080-35" daemon
prio=10
>>>>>>
>>>>>>                       tid=0x00007f3108038000 nid=0x538a waiting for
>>>>>>         monitor entry
>>>>>>                       [0x00007f31287d1000]
>>>>>>
>>>>>>                       394878:   java.lang.Thread.State: BLOCKED (on
>>>>>>         object monitor)
>>>>>>
>>>>>>                       394933- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.fate.____zookeeper.ZooCache.____getInstance(ZooCache.java:301)
>>>>>>
>>>>>>
>>>>>>
>>>>>>                       395012- - waiting to lock <0x00000000fa64f5b8>
(a
>>>>>>                  java.lang.Class
>>>>>>                       for
>>>>>> org.apache.accumulo.fate.____zookeeper.ZooCache)
>>>>>>
>>>>>>                       395120- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.core.____client.impl.Tables.____getZooCache(Tables.java:40)
>>>>>>
>>>>>>
>>>>>>                       395196- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.core.____client.impl.Tables.getMap(____Tables.java:44)
>>>>>>
>>>>>>
>>>>>>                       395267- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.core.____client.impl.Tables.____getNameToIdMap(Tables.java:78)
>>>>>>
>>>>>>
>>>>>>                       395346- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.core.____client.impl.Tables.getTableId(____Tables.java:64)
>>>>>>
>>>>>>
>>>>>>                       395421- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____getTableId(ConnectorImpl.java:____75)
>>>>>>
>>>>>>
>>>>>>                       395510- at
>>>>>>
>>>>>>
>>>>>>
>>>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____createScanner(ConnectorImpl.____java:137)
>>>>>>
>>>>>>
>>>>>>
>>>>>>                       I have not spent enough time reasoning about
the
>>>>>>         code to
>>>>>>                  understand
>>>>>>                       all of the nuances but I am interested in knowing
>>>>>>         if there
>>>>>>                  are any
>>>>>>                       mitigating strategies for dealing with this
>>>>>> thread
>>>>>>                  contention e.g.
>>>>>>                       would creating a cache entry for each member
of
>>>>>>         the Zookeeper
>>>>>>                       ensemble help relieve the strain? use multiple
>>>>>>                  classloaders? or is
>>>>>>                       my only option to spawn multiple JVMs?
>>>>>>
>>>>>>                       Thanks,
>>>>>>
>>>>>>                       Ariel Valentin
>>>>>>                       e-mail: ariel@arielvalentin.com
>>>>>>         <mailto:ariel@arielvalentin.com>
>>>>>>                  <mailto:ariel@arielvalentin.__com
>>>>>>         <mailto:ariel@arielvalentin.com>>
>>>>>>                  <mailto:ariel@arielvalentin.
>>>>>>         <mailto:ariel@arielvalentin.>____com
>>>>>>         <mailto:ariel@arielvalentin.__com
>>>>>> <mailto:ariel@arielvalentin.com>>>
>>>>>>
>>>>>>
>>>>>>                       website: http://blog.arielvalentin.com
>>>>>>                       skype: ariel.s.valentin
>>>>>>                       twitter: arielvalentin
>>>>>>                       linkedin:
>>>>>>         http://www.linkedin.com/____profile/view?id=8996534
>>>>>>         <http://www.linkedin.com/__profile/view?id=8996534>
>>>>>>                  <http://www.linkedin.com/__profile/view?id=8996534
>>>>>>         <http://www.linkedin.com/profile/view?id=8996534>>
>>>>>>                       ------------------------------____---------
>>>>>>
>>>>>>                       *simplicity *communication
>>>>>>                       *feedback *courage *respect
>>>>>>
>>>>>>
>>>>>>

Mime
View raw message