accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Diego Woitasen <diego.woita...@vhgroup.net>
Subject Re: Error stressing with pyaccumulo app
Date Mon, 10 Feb 2014 22:08:09 GMT
On Mon, Feb 10, 2014 at 6:21 PM, Josh Elser <josh.elser@gmail.com> wrote:
> I assume you're running a datanode along side the tserver on that node? That
> may be stretching the capabilities of that node (not to mention ec2 nodes
> tend to be a little flakey in general). 2G for the tserver.memory.maps.max
> might be a little safer.
>
> You got an error in a tserver log about that IOException in internalReader.
> After that, the tserver was still alive? And the proxy client was dead -
> quit normally?

Yes, everything is still alive.

>
> If that's the case, the proxy might just be disconnecting in a noisy manner?

Right!

I'll try with 2G  tserver.memory.maps.max.
>
>
> On 2/10/14, 3:38 PM, Diego Woitasen wrote:
>>
>> Hi,
>>   I tried increasing the tserver.memory.maps.max to 3G and failed
>> again, but with other error. I have a heap size of 3G and 7.5 GB of
>> total ram.
>>
>> The error that I've found in the crashed tserver is:
>>
>> 2014-02-08 03:37:35,497 [util.TServerUtils$THsHaServer] WARN : Got an
>> IOException in internalRead!
>>
>> The tserver haven't crashed, but the client was disconnected during the
>> test.
>>
>> Another hint is welcome :)
>>
>> On Mon, Feb 3, 2014 at 3:58 PM, Josh Elser <josh.elser@gmail.com> wrote:
>>>
>>> Oh, ok. So that isn't quite as bad as it seems.
>>>
>>> The "commits are held" exception is thrown when the tserver is running
>>> low
>>> on memory. The tserver will block new mutations coming in until it can
>>> process the ones it already has and free up some memory. This makes sense
>>> that you would see this more often when you have more proxy servers as
>>> the
>>> total amount of Mutations you can send to your Accumulo instance is
>>> increased. With one proxy server, your tserver had enough memory to
>>> process
>>> the incoming data. With many proxy servers, your tservers would likely
>>> fall
>>> over eventually because they'll get bogged down in JVM garbage
>>> collection.
>>>
>>> If you have more memory that you can give the tservers, that would help.
>>> Also, you should make sure that you're using the Accumulo native maps as
>>> this will use off-JVM-heap space instead of JVM heap which should help
>>> tremendously with your ingest rates.
>>>
>>> Native maps should be on by default unless you turned them off using the
>>> property 'tserver.memory.maps.native.enabled' in accumulo-site.xml.
>>> Additionally, you can try increasing the size of the native maps using
>>> 'tserver.memory.maps.max' in accumulo-site.xml. Just be aware that with
>>> the
>>> native maps, you need to ensure that total_ram > JVM_heap +
>>> tserver.memory.maps.max
>>>
>>> - Josh
>>>
>>>
>>> On 2/3/14, 1:33 PM, Diego Woitasen wrote:
>>>>
>>>>
>>>> I've launched the cluster again and I was able to reproduce the error:
>>>>
>>>> In the proxy I had the same error that I mention in one of my previous
>>>> messages, about a failure in a table server. I checked the log of that
>>>> tablet server and I found:
>>>>
>>>> 2014-02-03 18:02:24,065 [thrift.ProcessFunction] ERROR: Internal error
>>>> processing update
>>>> org.apache.accumulo.server.tabletserver.HoldTimeoutException: Commits
>>>> are
>>>> held
>>>>
>>>> A lot of times.
>>>>
>>>> Full log if someone want to have a look:
>>>>
>>>>
>>>> http://www.vhgroup.net/diegows/tserver_matrix-slave-07.accumulo-ec2-test.com.debug.log
>>>>
>>>> Regards,
>>>>     Diego
>>>>
>>>> On Mon, Feb 3, 2014 at 12:11 PM, Josh Elser <josh.elser@gmail.com>
>>>> wrote:
>>>>>
>>>>>
>>>>> I would assume that that proxy service would become a bottleneck fairly
>>>>> quickly and your throughput would benefit from running multiple
>>>>> proxies,
>>>>> but I don't have substantive numbers to back up that assertion.
>>>>>
>>>>> I'll put this on my list and see if I can reproduce something.
>>>>>
>>>>>
>>>>> On 2/3/14, 7:42 AM, Diego Woitasen wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> I have to run the tests again because they were ec2 instances and
I've
>>>>>> destroyed. It's easy to reproduce BTW.
>>>>>>
>>>>>> My question is, does it makes sense to run multiple proxies? Are
there
>>>>>> a limit? Right now I'm trying with 10 nodes and 10 proxies (running
on
>>>>>> every node). May be that doesn't make sense or it's a buggy
>>>>>> configuration.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 31, 2014 at 7:29 PM, Josh Elser <josh.elser@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> When you had multiple proxies, what were the failures on that
tablet
>>>>>>> server
>>>>>>> (10.202.6.46:9997).
>>>>>>>
>>>>>>> I'm curious why using one proxy didn't cause errors but multiple
did.
>>>>>>>
>>>>>>>
>>>>>>> On 1/31/14, 4:44 PM, Diego Woitasen wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I've reproduced the error and I've found this in the proxy
logs:
>>>>>>>>
>>>>>>>>         2014-01-31 19:47:50,430 [server.THsHaServer] WARN
: Got an
>>>>>>>> IOException in internalRead!
>>>>>>>>         java.io.IOException: Connection reset by peer
>>>>>>>>             at sun.nio.ch.FileDispatcherImpl.read0(Native
Method)
>>>>>>>>             at
>>>>>>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>>>>>>>             at
>>>>>>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>>>>>>>             at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>>>>>>>             at
>>>>>>>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>>>>>>>             at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
>>>>>>>>             at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:515)
>>>>>>>>             at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:305)
>>>>>>>>             at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:202)
>>>>>>>>             at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.select(TNonblockingServer.java:198)
>>>>>>>>             at
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.run(TNonblockingServer.java:154)
>>>>>>>>         2014-01-31 19:51:13,185 [impl.ThriftTransportPool]
WARN :
>>>>>>>> Server
>>>>>>>> 10.202.6.46:9997:9997 (30000) had 20 failures in a short
time
>>>>>>>> period,
>>>>>>>> will not complain anymore
>>>>>>>>
>>>>>>>> A lot of this messages appear in all the proxies.
>>>>>>>>
>>>>>>>> I tried the same stress tests agaisnt one proxy and I was
able to
>>>>>>>> increase the load without getting any error.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>       Diego
>>>>>>>>
>>>>>>>> On Thu, Jan 30, 2014 at 2:47 PM, Keith Turner <keith@deenlo.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Do you see more information in the proxy logs?  "# exceptions
1"
>>>>>>>>> indicates
>>>>>>>>> an unexpected exception occured in the batch writer client
code.
>>>>>>>>> The
>>>>>>>>> proxy
>>>>>>>>> uses this client code, so maybe there will be a more
detailed stack
>>>>>>>>> trace
>>>>>>>>> in
>>>>>>>>> its logs.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jan 30, 2014 at 9:46 AM, Diego Woitasen
>>>>>>>>> <diego.woitasen@vhgroup.net>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>      I'm testing with a ten node cluster with the
proxy enabled in
>>>>>>>>>> all
>>>>>>>>>> the
>>>>>>>>>> nodes. I'm doing a stress test balancing the connection
between
>>>>>>>>>> the
>>>>>>>>>> proxies using round robin. When I increase the load
(400 workers
>>>>>>>>>> writting) I get this error:
>>>>>>>>>>
>>>>>>>>>> AccumuloSecurityException:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException:
>>>>>>>>>> # constraint violations : 0  security codes: [] 
# server errors 0
>>>>>>>>>> #
>>>>>>>>>> exceptions 1')
>>>>>>>>>>
>>>>>>>>>> The complete message is:
>>>>>>>>>>
>>>>>>>>>> AccumuloSecurityException:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> AccumuloSecurityException(msg='org.apache.accumulo.core.client.MutationsRejectedException:
>>>>>>>>>> # constraint violations : 0  security codes: [] 
# server errors 0
>>>>>>>>>> #
>>>>>>>>>> exceptions 1')
>>>>>>>>>> kvlayer-test client failed!
>>>>>>>>>> Traceback (most recent call last):
>>>>>>>>>>       File "tests/kvlayer/test_accumulo_throughput.py",
line 64,
>>>>>>>>>> in
>>>>>>>>>> __call__
>>>>>>>>>>         self.client.put('t1', ((u,), self.one_mb))
>>>>>>>>>>       File
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_decorators.py",
>>>>>>>>>> line 26, in wrapper
>>>>>>>>>>         return method(*args, **kwargs)
>>>>>>>>>>       File
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/kvlayer-0.2.7-py2.7.egg/kvlayer/_accumulo.py",
>>>>>>>>>> line 154, in put
>>>>>>>>>>         batch_writer.close()
>>>>>>>>>>       File
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/__init__.py",
>>>>>>>>>> line 126, in close
>>>>>>>>>>         self._conn.client.closeWriter(self._writer)
>>>>>>>>>>       File
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py",
>>>>>>>>>> line 3149, in closeWriter
>>>>>>>>>>         self.recv_closeWriter()
>>>>>>>>>>       File
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> "/home/ubuntu/kvlayer-env/local/lib/python2.7/site-packages/pyaccumulo_dev-1.5.0.2-py2.7.egg/pyaccumulo/proxy/AccumuloProxy.py",
>>>>>>>>>> line 3172, in recv_closeWriter
>>>>>>>>>>         raise result.ouch2
>>>>>>>>>>
>>>>>>>>>> I'm not sure if the errror is produced by the way
I'm using the
>>>>>>>>>> cluster with multiple proxies, may be I should use
one.
>>>>>>>>>>
>>>>>>>>>> Ideas are welcome.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>       Diego
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Diego Woitasen
>>>>>>>>>> VHGroup - Linux and Open Source solutions architect
>>>>>>>>>> www.vhgroup.net
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>



-- 
Diego Woitasen
VHGroup - Linux and Open Source solutions architect
www.vhgroup.net

Mime
View raw message