Return-Path: X-Original-To: apmail-accumulo-user-archive@www.apache.org Delivered-To: apmail-accumulo-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 715C7101B4 for ; Mon, 10 Feb 2014 20:38:51 +0000 (UTC) Received: (qmail 83165 invoked by uid 500); 10 Feb 2014 20:38:50 -0000 Delivered-To: apmail-accumulo-user-archive@accumulo.apache.org Received: (qmail 83112 invoked by uid 500); 10 Feb 2014 20:38:49 -0000 Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@accumulo.apache.org Delivered-To: mailing list user@accumulo.apache.org Received: (qmail 83098 invoked by uid 99); 10 Feb 2014 20:38:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Feb 2014 20:38:49 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.128.169] (HELO mail-ve0-f169.google.com) (209.85.128.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Feb 2014 20:38:45 +0000 Received: by mail-ve0-f169.google.com with SMTP id oy12so5400524veb.14 for ; Mon, 10 Feb 2014 12:38:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=YvAY4Li+pg2xMzUdtFvkRoCmRMgc6jKFFfvc156YS4g=; b=Ldrt6r2AKh0RnZGAfFLwttKlpA4CL2qza3aBedHFAYBbMQlUsed5ddqNscGZ6oxEDc X3ZaqRjmIWth8cSOosmA2DGnbS37wtFuJyMRP8XrmhXpeoYTQHxyyY8zmcDVLmjNB0mL sXpud7OnBWwqTsVCj4O9G96IfJ1vHMeVuDXo79cosCNBIUuLjkteHtzIu5hPokRwy5Fa VfRL57ZBPH6NKEZewqnJozKsjpOdSKzinWHoCrrqGODd6BRJv5NtGXVWzPUopksfMVOa d+bIbcVrThPstbDAq8SKe5Gg0YUpCEO9wiW8vvXwBfwq0RRY2w5/qxXHTFPnBaSYwT/+ hrmQ== X-Gm-Message-State: ALoCoQlrFgXdu/caRrt7vKmGwqGIPuwv9Jja+RDFoApOYErz9W8Vx1Vb209v0lWueFGVQYsyfiDb MIME-Version: 1.0 X-Received: by 10.52.61.168 with SMTP id q8mr1014281vdr.40.1392064703558; Mon, 10 Feb 2014 12:38:23 -0800 (PST) Received: by 10.220.249.131 with HTTP; Mon, 10 Feb 2014 12:38:23 -0800 (PST) In-Reply-To: <52EFE6DD.2020507@gmail.com> References: <52EC23BE.8010401@gmail.com> <52EFB195.3050703@gmail.com> <52EFE6DD.2020507@gmail.com> Date: Mon, 10 Feb 2014 18:38:23 -0200 Message-ID: Subject: Re: Error stressing with pyaccumulo app From: Diego Woitasen To: user@accumulo.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi, I tried increasing the tserver.memory.maps.max to 3G and failed again, but with other error. I have a heap size of 3G and 7.5 GB of total ram. The error that I've found in the crashed tserver is: 2014-02-08 03:37:35,497 [util.TServerUtils$THsHaServer] WARN : Got an IOException in internalRead! The tserver haven't crashed, but the client was disconnected during the test. Another hint is welcome :) On Mon, Feb 3, 2014 at 3:58 PM, Josh Elser wrote: > Oh, ok. So that isn't quite as bad as it seems. > > The "commits are held" exception is thrown when the tserver is running low > on memory. The tserver will block new mutations coming in until it can > process the ones it already has and free up some memory. This makes sense > that you would see this more often when you have more proxy servers as the > total amount of Mutations you can send to your Accumulo instance is > increased. With one proxy server, your tserver had enough memory to process > the incoming data. With many proxy servers, your tservers would likely fall > over eventually because they'll get bogged down in JVM garbage collection. > > If you have more memory that you can give the tservers, that would help. > Also, you should make sure that you're using the Accumulo native maps as > this will use off-JVM-heap space instead of JVM heap which should help > tremendously with your ingest rates. > > Native maps should be on by default unless you turned them off using the > property 'tserver.memory.maps.native.enabled' in accumulo-site.xml. > Additionally, you can try increasing the size of the native maps using > 'tserver.memory.maps.max' in accumulo-site.xml. Just be aware that with the > native maps, you need to ensure that total_ram > JVM_heap + > tserver.memory.maps.max > > - Josh > > > On 2/3/14, 1:33 PM, Diego Woitasen wrote: >> >> I've launched the cluster again and I was able to reproduce the error: >> >> In the proxy I had the same error that I mention in one of my previous >> messages, about a failure in a table server. I checked the log of that >> tablet server and I found: >> >> 2014-02-03 18:02:24,065 [thrift.ProcessFunction] ERROR: Internal error >> processing update >> org.apache.accumulo.server.tabletserver.HoldTimeoutException: Commits are >> held >> >> A lot of times. >> >> Full log if someone want to have a look: >> >> http://www.vhgroup.net/diegows/tserver_matrix-slave-07.accumulo-ec2-test.com.debug.log >> >> Regards, >> Diego >> >> On Mon, Feb 3, 2014 at 12:11 PM, Josh Elser wrote: >>> >>> I would assume that that proxy service would become a bottleneck fairly >>> quickly and your throughput would benefit from running multiple proxies, >>> but I don't have substantive numbers to back up that assertion. >>> >>> I'll put this on my list and see if I can reproduce something. >>> >>> >>> On 2/3/14, 7:42 AM, Diego Woitasen wrote: >>>> >>>> >>>> I have to run the tests again because they were ec2 instances and I've >>>> destroyed. It's easy to reproduce BTW. >>>> >>>> My question is, does it makes sense to run multiple proxies? Are there >>>> a limit? Right now I'm trying with 10 nodes and 10 proxies (running on >>>> every node). May be that doesn't make sense or it's a buggy >>>> configuration. >>>> >>>> >>>> >>>> On Fri, Jan 31, 2014 at 7:29 PM, Josh Elser >>>> wrote: >>>>> >>>>> >>>>> When you had multiple proxies, what were the failures on that tablet >>>>> server >>>>> (10.202.6.46:9997). >>>>> >>>>> I'm curious why using one proxy didn't cause errors but multiple did. >>>>> >>>>> >>>>> On 1/31/14, 4:44 PM, Diego Woitasen wrote: >>>>>> >>>>>> >>>>>> >>>>>> I've reproduced the error and I've found this in the proxy logs: >>>>>> >>>>>> 2014-01-31 19:47:50,430 [server.THsHaServer] WARN : Got an >>>>>> IOException in internalRead! >>>>>> java.io.IOException: Connection reset by peer >>>>>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >>>>>> at >>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >>>>>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) >>>>>> at sun.nio.ch.IOUtil.read(IOUtil.java:197) >>>>>> at >>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:515) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:305) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:202) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.select(TNonblockingServer.java:198) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.run(TNonblockingServer.java:154) >>>>>> 2014-01-31 19:51:13,185 [impl.ThriftTransportPool] WARN : >>>>>> Server >>>>>> 10.202.6.46:9997:9997 (30000) had 20 failures in a short time period, >>>>>> will not complain anymore >>>>>> >>>>>> A lot of this messages appear in all the proxies. >>>>>> >>>>>> I tried the same stress tests agaisnt one proxy and I was able to >>>>>> increase the load without getting any error. >>>>>> >>>>>> Regards, >>>>>> Diego >>>>>> >>>>>> On Thu, Jan 30, 2014 at 2:47 PM, Keith Turner >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Do you see more information in the proxy logs? "# exceptions 1" >>>>>>> indicates >>>>>>> an unexpected exception occured in the batch writer client code. The >>>>>>> proxy >>>>>>> uses this client code, so maybe there will be a more detailed stack >>>>>>> trace >>>>>>> in >>>>>>> its logs. >>>>>>> >>>>>>> >>>>>>> On Thu, Jan 30, 2014 at 9:46 AM, Diego Woitasen >>>>>>> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> I'm testing with a ten node cluster with the proxy enabled in >>>>>>>> all >>>>>>>> the >>>>>>>> nodes. I'm doing a stress test balancing the connection between the >>>>>>>> proxies using round robin. When I increase the load (400 workers >>>>>>>> writting) I get this error: >>>>>>>> >>>>>>>> AccumuloSecurityException: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException: >>>>>>>> # constraint violations : 0 security codes: [] # server errors 0 # >>>>>>>> exceptions 1') >>>>>>>> >>>>>>>> The complete message is: >>>>>>>> >>>>>>>> AccumuloSecurityException: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException: >>>>>>>> # constraint violations : 0 security codes: [] # server errors 0 # >>>>>>>> exceptions 1') >>>>>>>> kvlayer-test client failed! >>>>>>>> Traceback (most recent call last): >>>>>>>> File "tests/kvlayer/test_accumulo_throughput.py", line 64, in >>>>>>>> __call__ >>>>>>>> self.client.put('t1', ((u,), self.one_mb)) >>>>>>>> File >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_decorators.py", >>>>>>>> line 26, in wrapper >>>>>>>> return method(*args, **kwargs) >>>>>>>> File >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_accumulo.py", >>>>>>>> line 154, in put >>>>>>>> batch_writer.close() >>>>>>>> File >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/__init__.py", >>>>>>>> line 126, in close >>>>>>>> self._conn.client.closeWriter(self._writer) >>>>>>>> File >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py", >>>>>>>> line 3149, in closeWriter >>>>>>>> self.recv_closeWriter() >>>>>>>> File >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py", >>>>>>>> line 3172, in recv_closeWriter >>>>>>>> raise result.ouch2 >>>>>>>> >>>>>>>> I'm not sure if the errror is produced by the way I'm using the >>>>>>>> cluster with multiple proxies, may be I should use one. >>>>>>>> >>>>>>>> Ideas are welcome. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Diego >>>>>>>> >>>>>>>> -- >>>>>>>> Diego Woitasen >>>>>>>> VHGroup - Linux and Open Source solutions architect >>>>>>>> www.vhgroup.net >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >> >> >> > -- Diego Woitasen VHGroup - Linux and Open Source solutions architect www.vhgroup.net