accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Synchronized Access to ZooCache Causing Threads to Block
Date Wed, 12 Feb 2014 20:58:42 GMT
ACCUMULO-1833 was merged into 1.5.1-SNAPSHOT a long time ago. I probably 
never cleaned up the branch after I finished the ticket.

I believe John Vines started looking at using Curator, but I think he 
decided in the end that there wasn't significant gains to be had by 
using it. I'm sure he commented on the ticket he had for it.

On 2/12/14, 3:56 PM, Ariel Valentin wrote:
> Is the 1833 branch going to be part of 1.5.1?
> I recall reading somewhere that there was interest in using Curator to ameliorate working
with zookeeper. Is that still part of the release roadmap?
>
> Thanks,
> Ariel
> ---
> Sent from my mobile device. Please excuse any errors.
>
>> On Feb 12, 2014, at 3:13 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>
>> Great, that helps. Thanks for the info, Ariel!
>>
>> I think this might be an area we want to revisit in later versions of Accumulo to
make the client API implementations a little more robust and supportive of concurrent usage.
>>
>>> On 2/12/14, 3:10 PM, Ariel Valentin wrote:
>>> Josh,
>>>
>>> The symptom is that we hit a point where a single server seems
>>> "unresponsive" but we do not see anything unusual going on in that
>>> machine and it seems idol. No heavy CPU, no I/O wait, low load average;
>>> however when we add additional instances of the JVM our capacity seems
>>> to increase linearly.
>>>
>>> Based on thread dumps and profiler stats it appears that under "heavy"
>>> load most of our threads are blocked trying to access ZooCache.
>>>
>>>
>>> Ariel Valentin
>>> e-mail: ariel@arielvalentin.com <mailto:ariel@arielvalentin.com>
>>> website: http://blog.arielvalentin.com
>>> skype: ariel.s.valentin
>>> twitter: arielvalentin
>>> linkedin: http://www.linkedin.com/profile/view?id=8996534
>>> ---------------------------------------
>>> *simplicity *communication
>>> *feedback *courage *respect
>>>
>>>
>>> On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser <josh.elser@gmail.com
>>> <mailto:josh.elser@gmail.com>> wrote:
>>>
>>>     Didn't mean to ask about the subject matter, but how you were using
>>>     the API. Are you actually seeing contention on ZooCache?
>>>
>>>
>>>     On 2/12/14, 1:19 PM, Ariel Valentin wrote:
>>>
>>>         Sorry but I am not at liberty to be specific about our business
>>>         problem.
>>>
>>>         Typical usage is multiple clients writing data to tables, which
>>>         scan to
>>>         avoid duplicate entries.
>>>
>>>         Ariel Valentin
>>>         e-mail: ariel@arielvalentin.com <mailto:ariel@arielvalentin.com>
>>>         <mailto:ariel@arielvalentin.__com <mailto:ariel@arielvalentin.com>>
>>>         website: http://blog.arielvalentin.com
>>>         skype: ariel.s.valentin
>>>         twitter: arielvalentin
>>>         linkedin: http://www.linkedin.com/__profile/view?id=8996534
>>>         <http://www.linkedin.com/profile/view?id=8996534>
>>>         ------------------------------__---------
>>>         *simplicity *communication
>>>         *feedback *courage *respect
>>>
>>>
>>>         On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser
>>>         <josh.elser@gmail.com <mailto:josh.elser@gmail.com>
>>>         <mailto:josh.elser@gmail.com <mailto:josh.elser@gmail.com>>>
wrote:
>>>
>>>              Also, I forgot this part before:
>>>
>>>              The ZooCache instance that's used *typically* comes from the
>>>              Instance object that your Connector was created from. In other
>>>              words, if you create multiple Instances (ZooKeeperInstance,
>>>              usually), you can get multiple ZooCaches which means that
>>>         concurrent
>>>              calls to methods off of those objects should not block one
>>>         another
>>>              (createScanner off of connector1 from instance1 should not
>>>         block
>>>              createScanner off of connector2 from instance2).
>>>
>>>              That should be something quick you can play with if you so
>>>         desire.
>>>
>>>
>>>              On 2/12/14, 9:57 AM, Josh Elser wrote:
>>>
>>>                  Yep, you'll likely also block on BatchScanner, anything in
>>>                  TableOperations, and a host of other things.
>>>
>>>                  For scanners, there's likely a standing recommendation to
>>>                  amortize the
>>>                  use of those objects (if you want to look up 5 range,
>>>         don't make 5
>>>                  scanners).
>>>
>>>                  Creating a cache per member in the work would likely
>>>         require
>>>                  some kind
>>>                  of paxos implementation to provide consistency which is
>>>         highly
>>>                  undesirable.
>>>
>>>                  One thing I'm curious about is the impact of removing
>>>         ZooCache
>>>                  altogether from things like the client api and see what
>>>         happens.
>>>                  I don't
>>>                  have a good way to measure that impact off the top of
>>>         my head
>>>                  though.
>>>
>>>                  Anyways, is this causing you problems in your usage of
>>>         the api?
>>>                  Could
>>>                  you elaborate a bit more on the specifics?
>>>
>>>                  On Feb 12, 2014 4:48 AM, "Ariel Valentin"
>>>                  <ariel@arielvalentin.com
>>>         <mailto:ariel@arielvalentin.com>
>>>         <mailto:ariel@arielvalentin.__com <mailto:ariel@arielvalentin.com>>
>>>                  <mailto:ariel@arielvalentin.
>>>         <mailto:ariel@arielvalentin.>____com
>>>
>>>                  <mailto:ariel@arielvalentin.__com
>>>         <mailto:ariel@arielvalentin.com>>>> wrote:
>>>
>>>                       I have run into a problem related to
>>>         ACCUMULO-1833, which
>>>                  appears to
>>>                       have addressed the issue for
>>>         MutliTableBatchWriter; however
>>>                  I am
>>>                       seeing this issue on the scanner side also:
>>>
>>>                       394750-"http-/192.168.220.196 <http://192.168.220.196>
>>>                  <http://192.168.220.196>:____8080-35" daemon prio=10
>>>
>>>                       tid=0x00007f3108038000 nid=0x538a waiting for
>>>         monitor entry
>>>                       [0x00007f31287d1000]
>>>
>>>                       394878:   java.lang.Thread.State: BLOCKED (on
>>>         object monitor)
>>>
>>>                       394933- at
>>>
>>>
>>>         org.apache.accumulo.fate.____zookeeper.ZooCache.____getInstance(ZooCache.java:301)
>>>
>>>
>>>                       395012- - waiting to lock <0x00000000fa64f5b8> (a
>>>                  java.lang.Class
>>>                       for org.apache.accumulo.fate.____zookeeper.ZooCache)
>>>
>>>                       395120- at
>>>
>>>
>>>         org.apache.accumulo.core.____client.impl.Tables.____getZooCache(Tables.java:40)
>>>
>>>                       395196- at
>>>
>>>
>>>         org.apache.accumulo.core.____client.impl.Tables.getMap(____Tables.java:44)
>>>
>>>                       395267- at
>>>
>>>
>>>         org.apache.accumulo.core.____client.impl.Tables.____getNameToIdMap(Tables.java:78)
>>>
>>>                       395346- at
>>>
>>>
>>>         org.apache.accumulo.core.____client.impl.Tables.getTableId(____Tables.java:64)
>>>
>>>                       395421- at
>>>
>>>
>>>         org.apache.accumulo.core.____client.impl.ConnectorImpl.____getTableId(ConnectorImpl.java:____75)
>>>
>>>                       395510- at
>>>
>>>
>>>         org.apache.accumulo.core.____client.impl.ConnectorImpl.____createScanner(ConnectorImpl.____java:137)
>>>
>>>
>>>                       I have not spent enough time reasoning about the
>>>         code to
>>>                  understand
>>>                       all of the nuances but I am interested in knowing
>>>         if there
>>>                  are any
>>>                       mitigating strategies for dealing with this thread
>>>                  contention e.g.
>>>                       would creating a cache entry for each member of
>>>         the Zookeeper
>>>                       ensemble help relieve the strain? use multiple
>>>                  classloaders? or is
>>>                       my only option to spawn multiple JVMs?
>>>
>>>                       Thanks,
>>>
>>>                       Ariel Valentin
>>>                       e-mail: ariel@arielvalentin.com
>>>         <mailto:ariel@arielvalentin.com>
>>>                  <mailto:ariel@arielvalentin.__com
>>>         <mailto:ariel@arielvalentin.com>>
>>>                  <mailto:ariel@arielvalentin.
>>>         <mailto:ariel@arielvalentin.>____com
>>>         <mailto:ariel@arielvalentin.__com <mailto:ariel@arielvalentin.com>>>
>>>
>>>
>>>                       website: http://blog.arielvalentin.com
>>>                       skype: ariel.s.valentin
>>>                       twitter: arielvalentin
>>>                       linkedin:
>>>         http://www.linkedin.com/____profile/view?id=8996534
>>>         <http://www.linkedin.com/__profile/view?id=8996534>
>>>                  <http://www.linkedin.com/__profile/view?id=8996534
>>>         <http://www.linkedin.com/profile/view?id=8996534>>
>>>                       ------------------------------____---------
>>>
>>>                       *simplicity *communication
>>>                       *feedback *courage *respect
>>>
>>>
>>>

Mime
View raw message