hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shashikant Banerjee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDDS-1124) java.lang.IllegalStateException exception in datanode log
Date Wed, 20 Feb 2019 13:43:00 GMT

    [ https://issues.apache.org/jira/browse/HDDS-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16773027#comment-16773027
] 

Shashikant Banerjee commented on HDDS-1124:
-------------------------------------------

172.27.76.72 being a leader sends append request to follower node 172.27.54.212 . The follower's
next index should be 3048

. Meanwhile leader election gets triggered and hence 172.27.54.212 also write a config entry
top its log at index 3048 as well as 172.27.76.72 writes a conf entry at index 3048.

 
{code:java}
2019-02-15 10:15:53,319 INFO org.apache.ratis.server.impl.RoleInfo: a40a7b01-a30b-469c-b373-9fcb20a126ed:
start FollowerState

2019-02-15 10:15:53,319 INFO org.apache.ratis.server.impl.FollowerState: a40a7b01-a30b-469c-b373-9fcb20a126ed:
FollowerState was interrupted: java.lang.InterruptedException: sleep interrupted

2019-02-15 10:15:53,463 INFO org.apache.ratis.server.impl.RaftServerImpl: a40a7b01-a30b-469c-b373-9fcb20a126ed:
change Leader from null to 943007c8-4fdd-4926-89e2-2c8c52c05073 at term 3 for appendEntries,
leader elected after 1364ms

2019-02-15 10:15:53,591 INFO org.apache.ratis.server.impl.RaftServerImpl: a40a7b01-a30b-469c-b373-9fcb20a126ed:
set configuration 3048: [a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858,
943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858], old=null at 3048

2019-02-15 10:15:53,594 INFO org.apache.ratis.server.storage.RaftLogWorker: a40a7b01-a30b-469c-b373-9fcb20a126ed-RaftLogWorker:
Truncating segments toTruncate: (3036, 3048) isOpen? true, length=11529, newEndIndex=3047

  toDelete: [], start index 3048

2019-02-15 10:15:53,594 INFO org.apache.ratis.server.storage.RaftLogWorker: a40a7b01-a30b-469c-b373-9fcb20a126ed-RaftLogWorker:
Starting segment from index:3048

2019-02-15 10:15:53,597 INFO org.apache.ratis.server.storage.RaftLogWorker: a40a7b01-a30b-469c-b373-9fcb20a126ed-RaftLogWorker:
Truncated log file /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3036
to length 11529 and moved it to /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3036-3047{code}
When the appendEntries reply gets received on 172.27.76.72, since the follower node has incremented
the next index to 3049 (as a conf entry is written at index 3048), but requestor assumes it
to be 3048, the assertion is hit.
{code:java}
final long replyNextIndex = reply.getNextIndex();
final long lastIndex = replyNextIndex - 1;
final boolean updateMatchIndex;

if (request.getEntriesCount() == 0) {
  Preconditions.assertTrue(!request.hasPreviousLog() ||
          lastIndex == request.getPreviousLog().getIndex(),
      "reply's next index is %s, request's previous is %s",
      replyNextIndex, request.getPreviousLog());
  updateMatchIndex = request.hasPreviousLog() && follower.getMatchIndex() < lastIndex;
} e
{code}
This is quite possible in the system and not a fatal bug.

 

 

> java.lang.IllegalStateException exception in datanode log
> ---------------------------------------------------------
>
>                 Key: HDDS-1124
>                 URL: https://issues.apache.org/jira/browse/HDDS-1124
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>            Reporter: Nilotpal Nandi
>            Assignee: Shashikant Banerjee
>            Priority: Major
>
> steps taken :
> --------------------
>  # created 12 datanodes cluster and running workload on all the nodes
> exception seen :
> ---------------------------
>  
> {noformat}
> 2019-02-15 10:15:53,355 INFO org.apache.ratis.server.storage.RaftLogWorker: 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker:
Rolled log segment from /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3036
to /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3036-3047
> 2019-02-15 10:15:53,367 INFO org.apache.ratis.server.impl.RaftServerImpl: 943007c8-4fdd-4926-89e2-2c8c52c05073:
set configuration 3048: [a40a7b01-a30b-469c-b373-9fcb20a126ed:172.27.54.212:9858, 8c77b16b-8054-49e3-b669-1ff759cfd271:172.27.23.196:9858,
943007c8-4fdd-4926-89e2-2c8c52c05073:172.27.76.72:9858], old=null at 3048
> 2019-02-15 10:15:53,523 INFO org.apache.ratis.server.storage.RaftLogWorker: 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker:
created new log segment /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048
> 2019-02-15 10:15:53,580 ERROR org.apache.ratis.grpc.server.GrpcLogAppender: Failed onNext
serverReply {
>  requestorId: "943007c8-4fdd-4926-89e2-2c8c52c05073"
>  replyId: "a40a7b01-a30b-469c-b373-9fcb20a126ed"
>  raftGroupId {
>  id: "\001\323\357*\221,O\300\200\266\001#C\327j\333"
>  }
>  success: true
> }
> term: 3
> nextIndex: 3049
> followerCommit: 3047
> java.lang.IllegalStateException: reply's next index is 3049, request's previous is term:
1
> index: 3047
> at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:60)
>  at org.apache.ratis.grpc.server.GrpcLogAppender.onSuccess(GrpcLogAppender.java:285)
>  at org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNextImpl(GrpcLogAppender.java:230)
>  at org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:215)
>  at org.apache.ratis.grpc.server.GrpcLogAppender$AppendLogResponseHandler.onNext(GrpcLogAppender.java:197)
>  at org.apache.ratis.thirdparty.io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onMessage(ClientCalls.java:421)
>  at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
>  at org.apache.ratis.thirdparty.io.grpc.ForwardingClientCallListener.onMessage(ForwardingClientCallListener.java:33)
>  at org.apache.ratis.thirdparty.io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1MessagesAvailable.runInContext(ClientCallImpl.java:519)
>  at org.apache.ratis.thirdparty.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
>  at org.apache.ratis.thirdparty.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker: 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker:
Rolling segment log-3048_3066 to index:3066
> 2019-02-15 10:15:56,442 INFO org.apache.ratis.server.storage.RaftLogWorker: 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker:
Rolled log segment from /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3048
to /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_3048-3066
> 2019-02-15 10:15:56,564 INFO org.apache.ratis.server.storage.RaftLogWorker: 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker:
created new log segment /data/disk1/ozone/meta/ratis/01d3ef2a-912c-4fc0-80b6-012343d76adb/current/log_inprogress_3067
> 2019-02-15 10:16:45,420 INFO org.apache.ratis.server.storage.RaftLogWorker: 943007c8-4fdd-4926-89e2-2c8c52c05073-RaftLogWorker:
Rolling segment log-3067_3077 to index:3077
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message