hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shashikant Banerjee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDDS-728) Datanodes are going to dead state after some interval
Date Thu, 25 Oct 2018 08:09:00 GMT

    [ https://issues.apache.org/jira/browse/HDDS-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663413#comment-16663413
] 

Shashikant Banerjee commented on HDDS-728:
------------------------------------------

The datanode is shutting down because the executors has been shutdown in the containserStateMachine
but same statemachine is being reused for a different pipeline. the applyTransaction gets
rejected and terminates the raft server.
{code:java}
2018-10-24 10:48:07,743 ERROR org.apache.ratis.server.impl.RaftServerImpl: 2974da2b-e765-43f9-8d30-45fe40dcb9ab:
applyTransaction failed for index:1 proto:(t:1, i:1)SMLOGENTRY, client-0A3AE1EA9FAB, cid=0
2018-10-24 10:48:07,746 ERROR org.apache.ratis.server.impl.StateMachineUpdater: Terminating
with exit status 2: StateMachineUpdater-2974da2b-e765-43f9-8d30-45fe40dcb9ab: the StateMachineUpdater
hits Throwable
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.CompletableFuture$AsyncSupply@68bc347e
rejected from java.util.concurrent.ThreadPoolExecutor@667bc719[Terminated, pool size = 0,
active threads = 0, queued tasks = 0, completed tasks = 27]
	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369)
	at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
	at java.util.concurrent.CompletableFuture.asyncSupplyStage(CompletableFuture.java:1604)
	at java.util.concurrent.CompletableFuture.supplyAsync(CompletableFuture.java:1830)
	at org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.applyTransaction(ContainerStateMachine.java:493)
	at org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(RaftServerImpl.java:1093)
	at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:148)
	at java.lang.Thread.run(Thread.java:745)
2018-10-24 10:48:07,751 INFO org.apache.hadoop.ozone.HddsDatanodeService: SHUTDOWN_MSG:
{code}
 

> Datanodes are going to dead state after some interval
> -----------------------------------------------------
>
>                 Key: HDDS-728
>                 URL: https://issues.apache.org/jira/browse/HDDS-728
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: Ozone Filesystem
>    Affects Versions: 0.3.0
>            Reporter: Soumitra Sulav
>            Priority: Major
>         Attachments: hadoop-root-datanode-ctr-e138-1518143905142-541600-02-000002.hwx.site.log,
hadoop-root-datanode-ctr-e138-1518143905142-541600-02-000003.hwx.site.log, hadoop-root-om-ctr-e138-1518143905142-541600-02-000002.hwx.site.log,
hadoop-root-scm-ctr-e138-1518143905142-541600-02-000002.hwx.site.log, om-audit-ctr-e138-1518143905142-541600-02-000002.hwx.site.log
>
>
> Setup a 5 datanode ozone cluster with HDP on top of it.
> After restarting all HDP services few times encountered below issue which is making the
HDP services to fail.
> Same exception was observed in an old setup but I thought it could have been issue with
the setup but now encountered the same issue in new setup as well.
> {code:java}
> 2018-10-24 10:42:03,308 WARN org.apache.ratis.grpc.server.GrpcServerProtocolService:
2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 1672d28e-800f-4318-895b-1648976acff6->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 2974da2b-e765-43f9-8d30-45fe40dcb9ab:
group-CE87A994686F not found.
> at org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252)
> at org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261)
> at org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256)
> at org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411)
> at org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54)
> at org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319)
> at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-10-24 10:42:03,342 WARN org.apache.ratis.grpc.server.GrpcServerProtocolService:
2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 7839294e-5657-447f-b320-6b390fffb963->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 2974da2b-e765-43f9-8d30-45fe40dcb9ab:
group-CE87A994686F not found.
> at org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252)
> at org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261)
> at org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256)
> at org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411)
> at org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54)
> at org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319)
> at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 2018-10-24 10:42:04,466 WARN org.apache.ratis.grpc.server.GrpcServerProtocolService:
2974da2b-e765-43f9-8d30-45fe40dcb9ab: Failed requestVote 1672d28e-800f-4318-895b-1648976acff6->2974da2b-e765-43f9-8d30-45fe40dcb9ab#0
> org.apache.ratis.protocol.GroupMismatchException: 2974da2b-e765-43f9-8d30-45fe40dcb9ab:
group-CE87A994686F not found.
> at org.apache.ratis.server.impl.RaftServerProxy$ImplMap.get(RaftServerProxy.java:114)
> at org.apache.ratis.server.impl.RaftServerProxy.getImplFuture(RaftServerProxy.java:252)
> at org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:261)
> at org.apache.ratis.server.impl.RaftServerProxy.getImpl(RaftServerProxy.java:256)
> at org.apache.ratis.server.impl.RaftServerProxy.requestVote(RaftServerProxy.java:411)
> at org.apache.ratis.grpc.server.GrpcServerProtocolService.requestVote(GrpcServerProtocolService.java:54)
> at org.apache.ratis.proto.grpc.RaftServerProtocolServiceGrpc$MethodHandlers.invoke(RaftServerProtocolServiceGrpc.java:319)
> at org.apache.ratis.thirdparty.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:171)
> at org.apache.ratis.thirdparty.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:283)
> at org.apache.ratis.thirdparty.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:707)
> at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
> at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message