Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AE2EB1048F for ; Thu, 13 Feb 2014 04:40:08 +0000 (UTC) Received: (qmail 38393 invoked by uid 500); 13 Feb 2014 04:40:08 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 37996 invoked by uid 500); 13 Feb 2014 04:40:01 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 37986 invoked by uid 99); 13 Feb 2014 04:39:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Feb 2014 04:39:58 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of josh.elser@gmail.com designates 209.85.216.53 as permitted sender) Received: from [209.85.216.53] (HELO mail-qa0-f53.google.com) (209.85.216.53) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Feb 2014 04:39:53 +0000 Received: by mail-qa0-f53.google.com with SMTP id cm18so15186797qab.26 for ; Wed, 12 Feb 2014 20:39:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=BQKxJGq3wAgAaf7rycuDQTREo0+ywcXghrgefhBlu1E=; b=ddO+pOjHnkbN7L+phm3x7T8+6BM74bMAliL50WJLtE5ZPKwcnq/PYI4mwlLw7jEvDG gkh/vIyP7lr7bs/bxfIZBGlV0SHwLH21moCe3QPoFf11b34BzZ0M7rxJqoVzOPZjdXGK U5e3YsDuZMD8DyPOJvRazxoEMqmR5Tn8qnd7pyxKDEEBTMBKTXvt7dQI24EbNsEqP7Gg +YN/PVDaRokeItXo/CDFqlnCw91545gkx9rXbpfIn5M7cx5sRBPpr9rcBxkFaBU0ofFj CDPQEofjGPxg36VA3TJbvr7sD6Of5X3kV31yv74SeJ41t5HLbv+63mWh8cD+sIDWR5cq r5bA== X-Received: by 10.140.91.12 with SMTP id y12mr69142778qgd.26.1392266372785; Wed, 12 Feb 2014 20:39:32 -0800 (PST) Received: from HW10447.local (pool-71-166-48-47.bltmmd.fios.verizon.net. [71.166.48.47]) by mx.google.com with ESMTPSA id 110sm958952qgv.19.2014.02.12.20.39.31 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 12 Feb 2014 20:39:31 -0800 (PST) Message-ID: <52FC4C82.2070502@gmail.com> Date: Wed, 12 Feb 2014 23:39:30 -0500 From: Josh Elser User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 MIME-Version: 1.0 To: user@accumulo.apache.org Subject: Re: Synchronized Access to ZooCache Causing Threads to Block References: <52FB9A63.5070701@gmail.com> <52FBC064.7070108@gmail.com> <52FBD605.40204@gmail.com> <999B5898-8A13-4B28-B5C4-5C997FD42E01@arielvalentin.com> <52FBE082.8080803@gmail.com> <52FBFD39.6040800@gmail.com> <0F920DC1-92B3-46DA-9CAC-4E7FF962137C@arielvalentin.com> In-Reply-To: <0F920DC1-92B3-46DA-9CAC-4E7FF962137C@arielvalentin.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Sick! Thanks for sharing -- feedback is always welcome and appreciated. On 2/12/14, 8:44 PM, Ariel Valentin wrote: > Josh, > > We experimented with 1.5.1 today; our load test numbers seem to indicate a 10x performance improvement over 1.5.0 on a single JVM. We are running additional experiments over the next few days to see what happens when we move to multiple JVMs. Stay tuned. > > Thanks, > Ariel > --- > Sent from my mobile device. Please excuse any errors. > >> On Feb 12, 2014, at 6:01 PM, Josh Elser wrote: >> >> Also, for completeness: I filed ACCUMULO-2362 to work on concurrent accesses to the same instance in the same JVM. >> >> Also, I misspoke earlier: much of the lock contention comes out of the Tables class, not from the Instance. ZooCache keeps a static map of instance to ZooCache which are used by a wide breadth of API calls. >> >>> On 2/12/14, 3:58 PM, Josh Elser wrote: >>> ACCUMULO-1833 was merged into 1.5.1-SNAPSHOT a long time ago. I probably >>> never cleaned up the branch after I finished the ticket. >>> >>> I believe John Vines started looking at using Curator, but I think he >>> decided in the end that there wasn't significant gains to be had by >>> using it. I'm sure he commented on the ticket he had for it. >>> >>>> On 2/12/14, 3:56 PM, Ariel Valentin wrote: >>>> Is the 1833 branch going to be part of 1.5.1? >>>> I recall reading somewhere that there was interest in using Curator to >>>> ameliorate working with zookeeper. Is that still part of the release >>>> roadmap? >>>> >>>> Thanks, >>>> Ariel >>>> --- >>>> Sent from my mobile device. Please excuse any errors. >>>> >>>>> On Feb 12, 2014, at 3:13 PM, Josh Elser wrote: >>>>> >>>>> Great, that helps. Thanks for the info, Ariel! >>>>> >>>>> I think this might be an area we want to revisit in later versions of >>>>> Accumulo to make the client API implementations a little more robust >>>>> and supportive of concurrent usage. >>>>> >>>>>> On 2/12/14, 3:10 PM, Ariel Valentin wrote: >>>>>> Josh, >>>>>> >>>>>> The symptom is that we hit a point where a single server seems >>>>>> "unresponsive" but we do not see anything unusual going on in that >>>>>> machine and it seems idol. No heavy CPU, no I/O wait, low load average; >>>>>> however when we add additional instances of the JVM our capacity seems >>>>>> to increase linearly. >>>>>> >>>>>> Based on thread dumps and profiler stats it appears that under "heavy" >>>>>> load most of our threads are blocked trying to access ZooCache. >>>>>> >>>>>> >>>>>> Ariel Valentin >>>>>> e-mail: ariel@arielvalentin.com >>>>>> website: http://blog.arielvalentin.com >>>>>> skype: ariel.s.valentin >>>>>> twitter: arielvalentin >>>>>> linkedin: http://www.linkedin.com/profile/view?id=8996534 >>>>>> --------------------------------------- >>>>>> *simplicity *communication >>>>>> *feedback *courage *respect >>>>>> >>>>>> >>>>>> On Wed, Feb 12, 2014 at 1:41 PM, Josh Elser >>>>> > wrote: >>>>>> >>>>>> Didn't mean to ask about the subject matter, but how you were using >>>>>> the API. Are you actually seeing contention on ZooCache? >>>>>> >>>>>> >>>>>> On 2/12/14, 1:19 PM, Ariel Valentin wrote: >>>>>> >>>>>> Sorry but I am not at liberty to be specific about our business >>>>>> problem. >>>>>> >>>>>> Typical usage is multiple clients writing data to tables, which >>>>>> scan to >>>>>> avoid duplicate entries. >>>>>> >>>>>> Ariel Valentin >>>>>> e-mail: ariel@arielvalentin.com >>>>>> >>>>>> >>>>> > >>>>>> website: http://blog.arielvalentin.com >>>>>> skype: ariel.s.valentin >>>>>> twitter: arielvalentin >>>>>> linkedin: http://www.linkedin.com/__profile/view?id=8996534 >>>>>> >>>>>> ------------------------------__--------- >>>>>> *simplicity *communication >>>>>> *feedback *courage *respect >>>>>> >>>>>> >>>>>> On Wed, Feb 12, 2014 at 10:59 AM, Josh Elser >>>>>> >>>>>> >> >>>>>> wrote: >>>>>> >>>>>> Also, I forgot this part before: >>>>>> >>>>>> The ZooCache instance that's used *typically* comes >>>>>> from the >>>>>> Instance object that your Connector was created from. >>>>>> In other >>>>>> words, if you create multiple Instances >>>>>> (ZooKeeperInstance, >>>>>> usually), you can get multiple ZooCaches which means that >>>>>> concurrent >>>>>> calls to methods off of those objects should not block one >>>>>> another >>>>>> (createScanner off of connector1 from instance1 should not >>>>>> block >>>>>> createScanner off of connector2 from instance2). >>>>>> >>>>>> That should be something quick you can play with if you so >>>>>> desire. >>>>>> >>>>>> >>>>>> On 2/12/14, 9:57 AM, Josh Elser wrote: >>>>>> >>>>>> Yep, you'll likely also block on BatchScanner, >>>>>> anything in >>>>>> TableOperations, and a host of other things. >>>>>> >>>>>> For scanners, there's likely a standing >>>>>> recommendation to >>>>>> amortize the >>>>>> use of those objects (if you want to look up 5 range, >>>>>> don't make 5 >>>>>> scanners). >>>>>> >>>>>> Creating a cache per member in the work would likely >>>>>> require >>>>>> some kind >>>>>> of paxos implementation to provide consistency >>>>>> which is >>>>>> highly >>>>>> undesirable. >>>>>> >>>>>> One thing I'm curious about is the impact of removing >>>>>> ZooCache >>>>>> altogether from things like the client api and see >>>>>> what >>>>>> happens. >>>>>> I don't >>>>>> have a good way to measure that impact off the top of >>>>>> my head >>>>>> though. >>>>>> >>>>>> Anyways, is this causing you problems in your usage of >>>>>> the api? >>>>>> Could >>>>>> you elaborate a bit more on the specifics? >>>>>> >>>>>> On Feb 12, 2014 4:48 AM, "Ariel Valentin" >>>>>> >>>>> >>>>>> >>>>> > >>>>>> >>>>> ____com >>>>>> >>>>>> >>>>> >>> wrote: >>>>>> >>>>>> I have run into a problem related to >>>>>> ACCUMULO-1833, which >>>>>> appears to >>>>>> have addressed the issue for >>>>>> MutliTableBatchWriter; however >>>>>> I am >>>>>> seeing this issue on the scanner side also: >>>>>> >>>>>> 394750-"http-/192.168.220.196 >>>>>> >>>>>> :____8080-35" daemon prio=10 >>>>>> >>>>>> tid=0x00007f3108038000 nid=0x538a waiting for >>>>>> monitor entry >>>>>> [0x00007f31287d1000] >>>>>> >>>>>> 394878: java.lang.Thread.State: BLOCKED (on >>>>>> object monitor) >>>>>> >>>>>> 394933- at >>>>>> >>>>>> >>>>>> >>>>>> org.apache.accumulo.fate.____zookeeper.ZooCache.____getInstance(ZooCache.java:301) >>>>>> >>>>>> >>>>>> >>>>>> 395012- - waiting to lock <0x00000000fa64f5b8> (a >>>>>> java.lang.Class >>>>>> for >>>>>> org.apache.accumulo.fate.____zookeeper.ZooCache) >>>>>> >>>>>> 395120- at >>>>>> >>>>>> >>>>>> >>>>>> org.apache.accumulo.core.____client.impl.Tables.____getZooCache(Tables.java:40) >>>>>> >>>>>> >>>>>> 395196- at >>>>>> >>>>>> >>>>>> >>>>>> org.apache.accumulo.core.____client.impl.Tables.getMap(____Tables.java:44) >>>>>> >>>>>> >>>>>> 395267- at >>>>>> >>>>>> >>>>>> >>>>>> org.apache.accumulo.core.____client.impl.Tables.____getNameToIdMap(Tables.java:78) >>>>>> >>>>>> >>>>>> 395346- at >>>>>> >>>>>> >>>>>> >>>>>> org.apache.accumulo.core.____client.impl.Tables.getTableId(____Tables.java:64) >>>>>> >>>>>> >>>>>> 395421- at >>>>>> >>>>>> >>>>>> >>>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____getTableId(ConnectorImpl.java:____75) >>>>>> >>>>>> >>>>>> 395510- at >>>>>> >>>>>> >>>>>> >>>>>> org.apache.accumulo.core.____client.impl.ConnectorImpl.____createScanner(ConnectorImpl.____java:137) >>>>>> >>>>>> >>>>>> >>>>>> I have not spent enough time reasoning about the >>>>>> code to >>>>>> understand >>>>>> all of the nuances but I am interested in knowing >>>>>> if there >>>>>> are any >>>>>> mitigating strategies for dealing with this >>>>>> thread >>>>>> contention e.g. >>>>>> would creating a cache entry for each member of >>>>>> the Zookeeper >>>>>> ensemble help relieve the strain? use multiple >>>>>> classloaders? or is >>>>>> my only option to spawn multiple JVMs? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Ariel Valentin >>>>>> e-mail: ariel@arielvalentin.com >>>>>> >>>>>> >>>>> > >>>>>> >>>>> ____com >>>>>> >>>>> >> >>>>>> >>>>>> >>>>>> website: http://blog.arielvalentin.com >>>>>> skype: ariel.s.valentin >>>>>> twitter: arielvalentin >>>>>> linkedin: >>>>>> http://www.linkedin.com/____profile/view?id=8996534 >>>>>> >>>>>> >>>>> > >>>>>> ------------------------------____--------- >>>>>> >>>>>> *simplicity *communication >>>>>> *feedback *courage *respect >>>>>> >>>>>> >>>>>>