accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Error stressing with pyaccumulo app
Date Mon, 10 Feb 2014 21:21:26 GMT
I assume you're running a datanode along side the tserver on that node? 
That may be stretching the capabilities of that node (not to mention ec2 
nodes tend to be a little flakey in general). 2G for the 
tserver.memory.maps.max might be a little safer.

You got an error in a tserver log about that IOException in 
internalReader. After that, the tserver was still alive? And the proxy 
client was dead - quit normally?

If that's the case, the proxy might just be disconnecting in a noisy manner?

On 2/10/14, 3:38 PM, Diego Woitasen wrote:
> Hi,
>   I tried increasing the tserver.memory.maps.max to 3G and failed
> again, but with other error. I have a heap size of 3G and 7.5 GB of
> total ram.
>
> The error that I've found in the crashed tserver is:
>
> 2014-02-08 03:37:35,497 [util.TServerUtils$THsHaServer] WARN : Got an
> IOException in internalRead!
>
> The tserver haven't crashed, but the client was disconnected during the test.
>
> Another hint is welcome :)
>
> On Mon, Feb 3, 2014 at 3:58 PM, Josh Elser <josh.elser@gmail.com> wrote:
>> Oh, ok. So that isn't quite as bad as it seems.
>>
>> The "commits are held" exception is thrown when the tserver is running low
>> on memory. The tserver will block new mutations coming in until it can
>> process the ones it already has and free up some memory. This makes sense
>> that you would see this more often when you have more proxy servers as the
>> total amount of Mutations you can send to your Accumulo instance is
>> increased. With one proxy server, your tserver had enough memory to process
>> the incoming data. With many proxy servers, your tservers would likely fall
>> over eventually because they'll get bogged down in JVM garbage collection.
>>
>> If you have more memory that you can give the tservers, that would help.
>> Also, you should make sure that you're using the Accumulo native maps as
>> this will use off-JVM-heap space instead of JVM heap which should help
>> tremendously with your ingest rates.
>>
>> Native maps should be on by default unless you turned them off using the
>> property 'tserver.memory.maps.native.enabled' in accumulo-site.xml.
>> Additionally, you can try increasing the size of the native maps using
>> 'tserver.memory.maps.max' in accumulo-site.xml. Just be aware that with the
>> native maps, you need to ensure that total_ram > JVM_heap +
>> tserver.memory.maps.max
>>
>> - Josh
>>
>>
>> On 2/3/14, 1:33 PM, Diego Woitasen wrote:
>>>
>>> I've launched the cluster again and I was able to reproduce the error:
>>>
>>> In the proxy I had the same error that I mention in one of my previous
>>> messages, about a failure in a table server. I checked the log of that
>>> tablet server and I found:
>>>
>>> 2014-02-03 18:02:24,065 [thrift.ProcessFunction] ERROR: Internal error
>>> processing update
>>> org.apache.accumulo.server.tabletserver.HoldTimeoutException: Commits are
>>> held
>>>
>>> A lot of times.
>>>
>>> Full log if someone want to have a look:
>>>
>>> http://www.vhgroup.net/diegows/tserver_matrix-slave-07.accumulo-ec2-test.com.debug.log
>>>
>>> Regards,
>>>     Diego
>>>
>>> On Mon, Feb 3, 2014 at 12:11 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>>>
>>>> I would assume that that proxy service would become a bottleneck fairly
>>>> quickly and your throughput would benefit from running multiple proxies,
>>>> but I don't have substantive numbers to back up that assertion.
>>>>
>>>> I'll put this on my list and see if I can reproduce something.
>>>>
>>>>
>>>> On 2/3/14, 7:42 AM, Diego Woitasen wrote:
>>>>>
>>>>>
>>>>> I have to run the tests again because they were ec2 instances and I've
>>>>> destroyed. It's easy to reproduce BTW.
>>>>>
>>>>> My question is, does it makes sense to run multiple proxies? Are there
>>>>> a limit? Right now I'm trying with 10 nodes and 10 proxies (running on
>>>>> every node). May be that doesn't make sense or it's a buggy
>>>>> configuration.
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 31, 2014 at 7:29 PM, Josh Elser <josh.elser@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> When you had multiple proxies, what were the failures on that tablet
>>>>>> server
>>>>>> (10.202.6.46:9997).
>>>>>>
>>>>>> I'm curious why using one proxy didn't cause errors but multiple
did.
>>>>>>
>>>>>>
>>>>>> On 1/31/14, 4:44 PM, Diego Woitasen wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I've reproduced the error and I've found this in the proxy logs:
>>>>>>>
>>>>>>>         2014-01-31 19:47:50,430 [server.THsHaServer] WARN : Got
an
>>>>>>> IOException in internalRead!
>>>>>>>         java.io.IOException: Connection reset by peer
>>>>>>>             at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>>>>>>             at
>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>>>>>>             at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>>>>>>             at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>>>>>>             at
>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>>>>>>             at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
>>>>>>>             at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:515)
>>>>>>>             at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:305)
>>>>>>>             at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:202)
>>>>>>>             at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.select(TNonblockingServer.java:198)
>>>>>>>             at
>>>>>>>
>>>>>>>
>>>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.run(TNonblockingServer.java:154)
>>>>>>>         2014-01-31 19:51:13,185 [impl.ThriftTransportPool] WARN
:
>>>>>>> Server
>>>>>>> 10.202.6.46:9997:9997 (30000) had 20 failures in a short time
period,
>>>>>>> will not complain anymore
>>>>>>>
>>>>>>> A lot of this messages appear in all the proxies.
>>>>>>>
>>>>>>> I tried the same stress tests agaisnt one proxy and I was able
to
>>>>>>> increase the load without getting any error.
>>>>>>>
>>>>>>> Regards,
>>>>>>>       Diego
>>>>>>>
>>>>>>> On Thu, Jan 30, 2014 at 2:47 PM, Keith Turner <keith@deenlo.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Do you see more information in the proxy logs?  "# exceptions
1"
>>>>>>>> indicates
>>>>>>>> an unexpected exception occured in the batch writer client
code.  The
>>>>>>>> proxy
>>>>>>>> uses this client code, so maybe there will be a more detailed
stack
>>>>>>>> trace
>>>>>>>> in
>>>>>>>> its logs.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jan 30, 2014 at 9:46 AM, Diego Woitasen
>>>>>>>> <diego.woitasen@vhgroup.net>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>      I'm testing with a ten node cluster with the proxy
enabled in
>>>>>>>>> all
>>>>>>>>> the
>>>>>>>>> nodes. I'm doing a stress test balancing the connection
between the
>>>>>>>>> proxies using round robin. When I increase the load (400
workers
>>>>>>>>> writting) I get this error:
>>>>>>>>>
>>>>>>>>> AccumuloSecurityException:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException:
>>>>>>>>> # constraint violations : 0  security codes: []  # server
errors 0 #
>>>>>>>>> exceptions 1')
>>>>>>>>>
>>>>>>>>> The complete message is:
>>>>>>>>>
>>>>>>>>> AccumuloSecurityException:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException:
>>>>>>>>> # constraint violations : 0  security codes: []  # server
errors 0 #
>>>>>>>>> exceptions 1')
>>>>>>>>> kvlayer-test client failed!
>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>       File "tests/kvlayer/test_accumulo_throughput.py",
line 64, in
>>>>>>>>> __call__
>>>>>>>>>         self.client.put('t1', ((u,), self.one_mb))
>>>>>>>>>       File
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_decorators.py",
>>>>>>>>> line 26, in wrapper
>>>>>>>>>         return method(*args, **kwargs)
>>>>>>>>>       File
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_accumulo.py",
>>>>>>>>> line 154, in put
>>>>>>>>>         batch_writer.close()
>>>>>>>>>       File
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/__init__.py",
>>>>>>>>> line 126, in close
>>>>>>>>>         self._conn.client.closeWriter(self._writer)
>>>>>>>>>       File
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py",
>>>>>>>>> line 3149, in closeWriter
>>>>>>>>>         self.recv_closeWriter()
>>>>>>>>>       File
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py",
>>>>>>>>> line 3172, in recv_closeWriter
>>>>>>>>>         raise result.ouch2
>>>>>>>>>
>>>>>>>>> I'm not sure if the errror is produced by the way I'm
using the
>>>>>>>>> cluster with multiple proxies, may be I should use one.
>>>>>>>>>
>>>>>>>>> Ideas are welcome.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>       Diego
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Diego Woitasen
>>>>>>>>> VHGroup - Linux and Open Source solutions architect
>>>>>>>>> www.vhgroup.net
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>

Mime
View raw message