accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Error stressing with pyaccumulo app
Date Mon, 03 Feb 2014 18:58:37 GMT
Oh, ok. So that isn't quite as bad as it seems.

The "commits are held" exception is thrown when the tserver is running 
low on memory. The tserver will block new mutations coming in until it 
can process the ones it already has and free up some memory. This makes 
sense that you would see this more often when you have more proxy 
servers as the total amount of Mutations you can send to your Accumulo 
instance is increased. With one proxy server, your tserver had enough 
memory to process the incoming data. With many proxy servers, your 
tservers would likely fall over eventually because they'll get bogged 
down in JVM garbage collection.

If you have more memory that you can give the tservers, that would help. 
Also, you should make sure that you're using the Accumulo native maps as 
this will use off-JVM-heap space instead of JVM heap which should help 
tremendously with your ingest rates.

Native maps should be on by default unless you turned them off using the 
property 'tserver.memory.maps.native.enabled' in accumulo-site.xml. 
Additionally, you can try increasing the size of the native maps using 
'tserver.memory.maps.max' in accumulo-site.xml. Just be aware that with 
the native maps, you need to ensure that total_ram > JVM_heap + 
tserver.memory.maps.max

- Josh

On 2/3/14, 1:33 PM, Diego Woitasen wrote:
> I've launched the cluster again and I was able to reproduce the error:
>
> In the proxy I had the same error that I mention in one of my previous
> messages, about a failure in a table server. I checked the log of that
> tablet server and I found:
>
> 2014-02-03 18:02:24,065 [thrift.ProcessFunction] ERROR: Internal error
> processing update
> org.apache.accumulo.server.tabletserver.HoldTimeoutException: Commits are held
>
> A lot of times.
>
> Full log if someone want to have a look:
> http://www.vhgroup.net/diegows/tserver_matrix-slave-07.accumulo-ec2-test.com.debug.log
>
> Regards,
>    Diego
>
> On Mon, Feb 3, 2014 at 12:11 PM, Josh Elser <josh.elser@gmail.com> wrote:
>> I would assume that that proxy service would become a bottleneck fairly
>> quickly and your throughput would benefit from running multiple proxies,
>> but I don't have substantive numbers to back up that assertion.
>>
>> I'll put this on my list and see if I can reproduce something.
>>
>>
>> On 2/3/14, 7:42 AM, Diego Woitasen wrote:
>>>
>>> I have to run the tests again because they were ec2 instances and I've
>>> destroyed. It's easy to reproduce BTW.
>>>
>>> My question is, does it makes sense to run multiple proxies? Are there
>>> a limit? Right now I'm trying with 10 nodes and 10 proxies (running on
>>> every node). May be that doesn't make sense or it's a buggy
>>> configuration.
>>>
>>>
>>>
>>> On Fri, Jan 31, 2014 at 7:29 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>>>
>>>> When you had multiple proxies, what were the failures on that tablet
>>>> server
>>>> (10.202.6.46:9997).
>>>>
>>>> I'm curious why using one proxy didn't cause errors but multiple did.
>>>>
>>>>
>>>> On 1/31/14, 4:44 PM, Diego Woitasen wrote:
>>>>>
>>>>>
>>>>> I've reproduced the error and I've found this in the proxy logs:
>>>>>
>>>>>        2014-01-31 19:47:50,430 [server.THsHaServer] WARN : Got an
>>>>> IOException in internalRead!
>>>>>        java.io.IOException: Connection reset by peer
>>>>>            at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>>>>            at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>>>>            at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>>>>            at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>>>>            at
>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>>>>            at
>>>>>
>>>>> org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
>>>>>            at
>>>>>
>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:515)
>>>>>            at
>>>>>
>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:305)
>>>>>            at
>>>>>
>>>>> org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:202)
>>>>>            at
>>>>>
>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.select(TNonblockingServer.java:198)
>>>>>            at
>>>>>
>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.run(TNonblockingServer.java:154)
>>>>>        2014-01-31 19:51:13,185 [impl.ThriftTransportPool] WARN : Server
>>>>> 10.202.6.46:9997:9997 (30000) had 20 failures in a short time period,
>>>>> will not complain anymore
>>>>>
>>>>> A lot of this messages appear in all the proxies.
>>>>>
>>>>> I tried the same stress tests agaisnt one proxy and I was able to
>>>>> increase the load without getting any error.
>>>>>
>>>>> Regards,
>>>>>      Diego
>>>>>
>>>>> On Thu, Jan 30, 2014 at 2:47 PM, Keith Turner <keith@deenlo.com>
wrote:
>>>>>>
>>>>>>
>>>>>> Do you see more information in the proxy logs?  "# exceptions 1"
>>>>>> indicates
>>>>>> an unexpected exception occured in the batch writer client code.
 The
>>>>>> proxy
>>>>>> uses this client code, so maybe there will be a more detailed stack
>>>>>> trace
>>>>>> in
>>>>>> its logs.
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 30, 2014 at 9:46 AM, Diego Woitasen
>>>>>> <diego.woitasen@vhgroup.net>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>     I'm testing with a ten node cluster with the proxy enabled
in all
>>>>>>> the
>>>>>>> nodes. I'm doing a stress test balancing the connection between
the
>>>>>>> proxies using round robin. When I increase the load (400 workers
>>>>>>> writting) I get this error:
>>>>>>>
>>>>>>> AccumuloSecurityException:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException:
>>>>>>> # constraint violations : 0  security codes: []  # server errors
0 #
>>>>>>> exceptions 1')
>>>>>>>
>>>>>>> The complete message is:
>>>>>>>
>>>>>>> AccumuloSecurityException:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException:
>>>>>>> # constraint violations : 0  security codes: []  # server errors
0 #
>>>>>>> exceptions 1')
>>>>>>> kvlayer-test client failed!
>>>>>>> Traceback (most recent call last):
>>>>>>>      File "tests/kvlayer/test_accumulo_throughput.py", line 64,
in
>>>>>>> __call__
>>>>>>>        self.client.put('t1', ((u,), self.one_mb))
>>>>>>>      File
>>>>>>>
>>>>>>>
>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_decorators.py",
>>>>>>> line 26, in wrapper
>>>>>>>        return method(*args, **kwargs)
>>>>>>>      File
>>>>>>>
>>>>>>>
>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_accumulo.py",
>>>>>>> line 154, in put
>>>>>>>        batch_writer.close()
>>>>>>>      File
>>>>>>>
>>>>>>>
>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/__init__.py",
>>>>>>> line 126, in close
>>>>>>>        self._conn.client.closeWriter(self._writer)
>>>>>>>      File
>>>>>>>
>>>>>>>
>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py",
>>>>>>> line 3149, in closeWriter
>>>>>>>        self.recv_closeWriter()
>>>>>>>      File
>>>>>>>
>>>>>>>
>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py",
>>>>>>> line 3172, in recv_closeWriter
>>>>>>>        raise result.ouch2
>>>>>>>
>>>>>>> I'm not sure if the errror is produced by the way I'm using the
>>>>>>> cluster with multiple proxies, may be I should use one.
>>>>>>>
>>>>>>> Ideas are welcome.
>>>>>>>
>>>>>>> Regards,
>>>>>>>      Diego
>>>>>>>
>>>>>>> --
>>>>>>> Diego Woitasen
>>>>>>> VHGroup - Linux and Open Source solutions architect
>>>>>>> www.vhgroup.net
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>

Mime
View raw message