accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diego Woitasen <diego.woita...@vhgroup.net>
Subject Re: Error stressing with pyaccumulo app
Date Mon, 10 Feb 2014 20:38:23 GMT
Hi,
 I tried increasing the tserver.memory.maps.max to 3G and failed
again, but with other error. I have a heap size of 3G and 7.5 GB of
total ram.

The error that I've found in the crashed tserver is:

2014-02-08 03:37:35,497 [util.TServerUtils$THsHaServer] WARN : Got an
IOException in internalRead!

The tserver haven't crashed, but the client was disconnected during the test.

Another hint is welcome :)

On Mon, Feb 3, 2014 at 3:58 PM, Josh Elser <josh.elser@gmail.com> wrote:
> Oh, ok. So that isn't quite as bad as it seems.
>
> The "commits are held" exception is thrown when the tserver is running low
> on memory. The tserver will block new mutations coming in until it can
> process the ones it already has and free up some memory. This makes sense
> that you would see this more often when you have more proxy servers as the
> total amount of Mutations you can send to your Accumulo instance is
> increased. With one proxy server, your tserver had enough memory to process
> the incoming data. With many proxy servers, your tservers would likely fall
> over eventually because they'll get bogged down in JVM garbage collection.
>
> If you have more memory that you can give the tservers, that would help.
> Also, you should make sure that you're using the Accumulo native maps as
> this will use off-JVM-heap space instead of JVM heap which should help
> tremendously with your ingest rates.
>
> Native maps should be on by default unless you turned them off using the
> property 'tserver.memory.maps.native.enabled' in accumulo-site.xml.
> Additionally, you can try increasing the size of the native maps using
> 'tserver.memory.maps.max' in accumulo-site.xml. Just be aware that with the
> native maps, you need to ensure that total_ram > JVM_heap +
> tserver.memory.maps.max
>
> - Josh
>
>
> On 2/3/14, 1:33 PM, Diego Woitasen wrote:
>>
>> I've launched the cluster again and I was able to reproduce the error:
>>
>> In the proxy I had the same error that I mention in one of my previous
>> messages, about a failure in a table server. I checked the log of that
>> tablet server and I found:
>>
>> 2014-02-03 18:02:24,065 [thrift.ProcessFunction] ERROR: Internal error
>> processing update
>> org.apache.accumulo.server.tabletserver.HoldTimeoutException: Commits are
>> held
>>
>> A lot of times.
>>
>> Full log if someone want to have a look:
>>
>> http://www.vhgroup.net/diegows/tserver_matrix-slave-07.accumulo-ec2-test.com.debug.log
>>
>> Regards,
>>    Diego
>>
>> On Mon, Feb 3, 2014 at 12:11 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>>
>>> I would assume that that proxy service would become a bottleneck fairly
>>> quickly and your throughput would benefit from running multiple proxies,
>>> but I don't have substantive numbers to back up that assertion.
>>>
>>> I'll put this on my list and see if I can reproduce something.
>>>
>>>
>>> On 2/3/14, 7:42 AM, Diego Woitasen wrote:
>>>>
>>>>
>>>> I have to run the tests again because they were ec2 instances and I've
>>>> destroyed. It's easy to reproduce BTW.
>>>>
>>>> My question is, does it makes sense to run multiple proxies? Are there
>>>> a limit? Right now I'm trying with 10 nodes and 10 proxies (running on
>>>> every node). May be that doesn't make sense or it's a buggy
>>>> configuration.
>>>>
>>>>
>>>>
>>>> On Fri, Jan 31, 2014 at 7:29 PM, Josh Elser <josh.elser@gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> When you had multiple proxies, what were the failures on that tablet
>>>>> server
>>>>> (10.202.6.46:9997).
>>>>>
>>>>> I'm curious why using one proxy didn't cause errors but multiple did.
>>>>>
>>>>>
>>>>> On 1/31/14, 4:44 PM, Diego Woitasen wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> I've reproduced the error and I've found this in the proxy logs:
>>>>>>
>>>>>>        2014-01-31 19:47:50,430 [server.THsHaServer] WARN : Got an
>>>>>> IOException in internalRead!
>>>>>>        java.io.IOException: Connection reset by peer
>>>>>>            at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>>>>>            at
>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>>>>>            at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>>>>>            at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>>>>>            at
>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>>>>>            at
>>>>>>
>>>>>>
>>>>>> org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
>>>>>>            at
>>>>>>
>>>>>>
>>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:515)
>>>>>>            at
>>>>>>
>>>>>>
>>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:305)
>>>>>>            at
>>>>>>
>>>>>>
>>>>>> org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:202)
>>>>>>            at
>>>>>>
>>>>>>
>>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.select(TNonblockingServer.java:198)
>>>>>>            at
>>>>>>
>>>>>>
>>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.run(TNonblockingServer.java:154)
>>>>>>        2014-01-31 19:51:13,185 [impl.ThriftTransportPool] WARN :
>>>>>> Server
>>>>>> 10.202.6.46:9997:9997 (30000) had 20 failures in a short time period,
>>>>>> will not complain anymore
>>>>>>
>>>>>> A lot of this messages appear in all the proxies.
>>>>>>
>>>>>> I tried the same stress tests agaisnt one proxy and I was able to
>>>>>> increase the load without getting any error.
>>>>>>
>>>>>> Regards,
>>>>>>      Diego
>>>>>>
>>>>>> On Thu, Jan 30, 2014 at 2:47 PM, Keith Turner <keith@deenlo.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Do you see more information in the proxy logs?  "# exceptions
1"
>>>>>>> indicates
>>>>>>> an unexpected exception occured in the batch writer client code.
 The
>>>>>>> proxy
>>>>>>> uses this client code, so maybe there will be a more detailed
stack
>>>>>>> trace
>>>>>>> in
>>>>>>> its logs.
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 30, 2014 at 9:46 AM, Diego Woitasen
>>>>>>> <diego.woitasen@vhgroup.net>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>     I'm testing with a ten node cluster with the proxy enabled
in
>>>>>>>> all
>>>>>>>> the
>>>>>>>> nodes. I'm doing a stress test balancing the connection between
the
>>>>>>>> proxies using round robin. When I increase the load (400
workers
>>>>>>>> writting) I get this error:
>>>>>>>>
>>>>>>>> AccumuloSecurityException:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException:
>>>>>>>> # constraint violations : 0  security codes: []  # server
errors 0 #
>>>>>>>> exceptions 1')
>>>>>>>>
>>>>>>>> The complete message is:
>>>>>>>>
>>>>>>>> AccumuloSecurityException:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException:
>>>>>>>> # constraint violations : 0  security codes: []  # server
errors 0 #
>>>>>>>> exceptions 1')
>>>>>>>> kvlayer-test client failed!
>>>>>>>> Traceback (most recent call last):
>>>>>>>>      File "tests/kvlayer/test_accumulo_throughput.py", line
64, in
>>>>>>>> __call__
>>>>>>>>        self.client.put('t1', ((u,), self.one_mb))
>>>>>>>>      File
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_decorators.py",
>>>>>>>> line 26, in wrapper
>>>>>>>>        return method(*args, **kwargs)
>>>>>>>>      File
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_accumulo.py",
>>>>>>>> line 154, in put
>>>>>>>>        batch_writer.close()
>>>>>>>>      File
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/__init__.py",
>>>>>>>> line 126, in close
>>>>>>>>        self._conn.client.closeWriter(self._writer)
>>>>>>>>      File
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py",
>>>>>>>> line 3149, in closeWriter
>>>>>>>>        self.recv_closeWriter()
>>>>>>>>      File
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py",
>>>>>>>> line 3172, in recv_closeWriter
>>>>>>>>        raise result.ouch2
>>>>>>>>
>>>>>>>> I'm not sure if the errror is produced by the way I'm using
the
>>>>>>>> cluster with multiple proxies, may be I should use one.
>>>>>>>>
>>>>>>>> Ideas are welcome.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>      Diego
>>>>>>>>
>>>>>>>> --
>>>>>>>> Diego Woitasen
>>>>>>>> VHGroup - Linux and Open Source solutions architect
>>>>>>>> www.vhgroup.net
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>



-- 
Diego Woitasen
VHGroup - Linux and Open Source solutions architect
www.vhgroup.net

Mime
View raw message