accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2388) Continuous Ingest clients die
Date Wed, 16 Apr 2014 17:42:19 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971713#comment-13971713
] 

Keith Turner commented on ACCUMULO-2388:
----------------------------------------

I was able to do another run w/ ACCUMULO-2668 and it was successful.  I analyzed all of the
tserver logs and still saw some high walog times.  So the potential for the problem I saw
previously was still there, it just did not happen to all occur on one node at one time.

{noformat}
[cluster@ip-10-1-2-10 continuous]$ pssh -i -h ingesters.txt 'grep -a "UpSess" /opt/accumulo-1.6.0/logs/*tserver*
| egrep -a -v "lt=[0-9]?[0-9][.][0-9]"'
[12] 17:31:55 [SUCCESS] ip-10-1-2-18
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-18.ec2.internal.debug.log:2014-04-15 06:17:24,337
[tserver.TabletServer] DEBUG: UpSess 10.1.2.28:48032 16,975 in 102.868s, at=[0 0 0.00 32]
ft=102.667s(pt=0.090s lt=102.317s ct=0.260s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-18.ec2.internal.debug.log:2014-04-15 06:17:38,671
[tserver.TabletServer] DEBUG: UpSess 10.1.2.19:52736 16,842 in 118.418s, at=[0 0 0.00 32]
ft=118.355s(pt=0.008s lt=118.178s ct=0.169s)
[17] 17:31:58 [SUCCESS] ip-10-1-2-23
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:36,850
[tserver.TabletServer] DEBUG: UpSess 10.1.2.20:50978 17,092 in 137.192s, at=[0 1 0.03 32]
ft=137.122s(pt=0.006s lt=136.790s ct=0.326s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:36,896
[tserver.TabletServer] DEBUG: UpSess 10.1.2.16:38184 24,779 in 137.509s, at=[0 0 0.00 32]
ft=137.295s(pt=0.008s lt=136.912s ct=0.375s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:37,429
[tserver.TabletServer] DEBUG: UpSess 10.1.2.21:59812 17,032 in 138.041s, at=[0 0 0.00 32]
ft=137.855s(pt=0.013s lt=137.715s ct=0.127s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:39,004
[tserver.TabletServer] DEBUG: UpSess 10.1.2.24:32948 8,003 in 139.603s, at=[0 0 0.00 32] ft=139.400s(pt=0.003s
lt=139.190s ct=0.207s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:39,070
[tserver.TabletServer] DEBUG: UpSess 10.1.2.14:37183 16,372 in 139.701s, at=[0 0 0.00 32]
ft=139.506s(pt=0.008s lt=139.228s ct=0.270s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:39,071
[tserver.TabletServer] DEBUG: UpSess 10.1.2.27:40775 16,313 in 139.670s, at=[0 1 0.03 32]
ft=139.497s(pt=0.012s lt=139.211s ct=0.274s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:39,345
[tserver.TabletServer] DEBUG: UpSess 10.1.2.22:37906 33,858 in 140.193s, at=[0 0 0.00 32]
ft=140.124s(pt=0.045s lt=139.540s ct=0.539s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:41,388
[tserver.TabletServer] DEBUG: UpSess 10.1.2.17:39697 33,352 in 142.488s, at=[0 0 0.00 32]
ft=142.351s(pt=0.015s lt=141.952s ct=0.384s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:47,494
[tserver.TabletServer] DEBUG: UpSess 10.1.2.15:43946 8,477 in 120.655s, at=[0 1 0.03 32] ft=120.607s(pt=0.002s
lt=120.210s ct=0.395s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:47,669
[tserver.TabletServer] DEBUG: UpSess 10.1.2.13:57710 17,033 in 115.454s, at=[0 1 0.03 32]
ft=115.211s(pt=0.013s lt=114.630s ct=0.568s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:47,675
[tserver.TabletServer] DEBUG: UpSess 10.1.2.28:44022 16,907 in 116.482s, at=[0 0 0.00 32]
ft=116.322s(pt=0.004s lt=115.743s ct=0.575s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:47,784
[tserver.TabletServer] DEBUG: UpSess 10.1.2.29:48898 25,146 in 126.834s, at=[0 0 0.00 32]
ft=126.749s(pt=0.007s lt=126.057s ct=0.685s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:50,020
[tserver.TabletServer] DEBUG: UpSess 10.1.2.10:60386 2,533 in 140.164s, at=[0 0 0.00 1] ft=140.155s(pt=0.000s
lt=140.129s ct=0.026s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:50,388
[tserver.TabletServer] DEBUG: UpSess 10.1.2.19:43589 16,838 in 132.732s, at=[0 1 0.03 32]
ft=132.671s(pt=0.004s lt=132.276s ct=0.391s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:50,503
[tserver.TabletServer] DEBUG: UpSess 10.1.2.25:50677 25,491 in 136.344s, at=[0 0 0.00 32]
ft=136.211s(pt=0.016s lt=135.684s ct=0.511s)
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-23.ec2.internal.debug.log:2014-04-15 06:17:53,409
[tserver.TabletServer] DEBUG: UpSess 10.1.2.18:36106 16,961 in 143.786s, at=[0 1 0.06 32]
ft=143.719s(pt=0.004s lt=143.373s ct=0.342s)
{noformat}

{noformat}
[cluster@ip-10-1-2-10 continuous]$ pssh -i -h ingesters.txt 'grep -a "writeTime" /opt/accumulo-1.6.0/logs/*tserver*
| egrep -v "writeTime:[1-2]?[0-9]?[0-9]?[0-9]?[0-9]m" '
[4] 17:34:02 [SUCCESS] ip-10-1-2-13
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-13.ec2.internal.debug.log:2014-04-15 06:17:00,243
[log.TabletServerLogger] DEBUG:  wrote MinC finish  64556: writeTime:30902ms 
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-13.ec2.internal.debug.log:2014-04-15 06:17:00,246
[log.TabletServerLogger] DEBUG:  wrote MinC finish  64557: writeTime:30663ms 
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-13.ec2.internal.debug.log:2014-04-15 06:17:03,548
[log.TabletServerLogger] DEBUG:  wrote MinC finish  64558: writeTime:33712ms 
[12] 17:34:02 [SUCCESS] ip-10-1-2-24
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-24.ec2.internal.debug.log:2014-04-16 04:32:31,653
[log.TabletServerLogger] DEBUG:  wrote MinC finish  822843: writeTime:35119ms 
[14] 17:34:02 [SUCCESS] ip-10-1-2-22
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-22.ec2.internal.debug.log:2014-04-15 18:15:27,592
[log.TabletServerLogger] DEBUG:  wrote MinC finish  153363: writeTime:39875ms 
[16] 17:34:02 [SUCCESS] ip-10-1-2-29
/opt/accumulo-1.6.0/logs/tserver_ip-10-1-2-29.ec2.internal.debug.log:2014-04-15 06:21:25,684
[log.TabletServerLogger] DEBUG:  wrote MinC finish  62514: writeTime:50973ms 
{noformat}

> Continuous Ingest clients die
> -----------------------------
>
>                 Key: ACCUMULO-2388
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2388
>             Project: Accumulo
>          Issue Type: Bug
>          Components: test, tserver
>         Environment: 1.6.0-SNAPSHOT (sha-1: 0da9a56)
> cdh4.5.0
>            Reporter: Mike Drob
>            Assignee: Mike Drob
>            Priority: Minor
>              Labels: 16_qa_bug
>             Fix For: 1.6.1
>
>         Attachments: ACCUMULO-2388-1.patch, tracer.debug.log, tserver1.log
>
>
> I was running continuous ingest on a 7 node cluster (5 slaves) and after enabling HDFS
agitation, my clients died.
> {code:title=ingest.err}
> Thread "org.apache.accumulo.test.continuous.ContinuousIngest" died java.lang.reflect.InvocationTargetException
> java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.accumulo.start.Main$1.run(Main.java:137)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.lang.reflect.UndeclaredThrowableException
> at $Proxy9.addMutation(Unknown Source)
> at org.apache.accumulo.test.continuous.ContinuousIngest.main(ContinuousIngest.java:212)
> ... 6 more
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.accumulo.trace.instrument.TraceProxy$2.invoke(TraceProxy.java:43)
> ... 8 more
> Caused by: org.apache.accumulo.core.client.MutationsRejectedException: # constraint violations
: 0 security codes: {} # server errors 1 # exceptions 0
> at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.checkForFailures(TabletServerBatchWriter.java:537)
> at org.apache.accumulo.core.client.impl.TabletServerBatchWriter.addMutation(TabletServerBatchWriter.java:258)
> at org.apache.accumulo.core.client.impl.BatchWriterImpl.addMutation(BatchWriterImpl.java:43)
> ... 12 more
> {code}
> {code:title=ingest.out}
> UUID 1392844086463 f822a6a9-9592-4b3a-ab3b-1c172be20b96
> FLUSH 1392844135523 49047 6165 1000000 1000000
> FLUSH 1392844165594 30071 7787 2000000 1000000
> FLUSH 1392844195875 30281 7816 3000000 1000000
> FLUSH 1392844226787 30912 8086 4000000 1000000
> FLUSH 1392844257194 30407 7989 5000000 1000000
> FLUSH 1392844287518 30324 7743 6000000 1000000
> FLUSH 1392844325833 38315 10933 7000000 1000000
> FLUSH 1392844364708 38875 7916 8000000 1000000
> FLUSH 1392844395818 31110 8104 9000000 1000000
> 2014-02-19 13:16:57,444 [impl.TabletServerBatchWriter] ERROR: Server side error on tserver1:10011:
org.apache.thrift.TApplicationException: Internal error processing applyUpdates
> 2014-02-19 13:16:57,446 [impl.TabletServerBatchWriter] ERROR: Failed to send tablet server
tserver1:10011 its batch : Error on server tserver1:10011
> org.apache.accumulo.core.client.impl.AccumuloServerException: Error on server tserver1:10011
> at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer(TabletServerBatchWriter.java:937)
> at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.access$1600(TabletServerBatchWriter.java:616)
> at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.send(TabletServerBatchWriter.java:801)
> at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter$SendTask.run(TabletServerBatchWriter.java:765)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at org.apache.accumulo.trace.instrument.TraceRunnable.run(TraceRunnable.java:47)
> at org.apache.accumulo.core.util.LoggingRunnable.run(LoggingRunnable.java:34)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.thrift.TApplicationException: Internal error processing applyUpdates
> at org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
> at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.recv_closeUpdate(TabletClientService.java:431)
> at org.apache.accumulo.core.tabletserver.thrift.TabletClientService$Client.closeUpdate(TabletClientService.java:417)
> at org.apache.accumulo.core.client.impl.TabletServerBatchWriter$MutationWriter.sendMutationsToTabletServer(TabletServerBatchWriter.java:899)
> ... 11 more
> {code}
> {code:title=tserver.log}
> 2014-02-19 13:16:56,156 [util.TServerUtils$THsHaServer] WARN : Got an IOException in
internalRead!
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcher.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
> at sun.nio.ch.IOUtil.read(IOUtil.java:171)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> at org.apache.thrift.transport.TNonblockingSocket.read(TNonblockingSocket.java:141)
> at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.internalRead(AbstractNonblockingServer.java:515)
> at org.apache.thrift.server.AbstractNonblockingServer$FrameBuffer.read(AbstractNonblockingServer.java:355)
> at org.apache.thrift.server.AbstractNonblockingServer$AbstractSelectThread.handleRead(AbstractNonblockingServer.java:202)
> at org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.select(TNonblockingServer.java:198)
> at org.apache.thrift.server.TNonblockingServer$SelectAcceptThread.run(TNonblockingServer.java:154)
> {code}
> Note that this last message was not propagated to the monitor for some reason, but that
is likely a different issue. (I had been seeing other WARN messages show up earlier.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message