From hdfs-issues-return-239992-archive-asf-public=cust-asf.ponee.io@hadoop.apache.org Thu Oct 25 19:52:06 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 80CD2180675 for ; Thu, 25 Oct 2018 19:52:05 +0200 (CEST) Received: (qmail 37009 invoked by uid 500); 25 Oct 2018 17:52:04 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 36998 invoked by uid 99); 25 Oct 2018 17:52:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2018 17:52:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 01633C1DF6 for ; Thu, 25 Oct 2018 17:52:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id ff_-0D618Ty8 for ; Thu, 25 Oct 2018 17:52:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 77B875F524 for ; Thu, 25 Oct 2018 17:52:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id EEABDE25AA for ; Thu, 25 Oct 2018 17:52:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 797B82669F for ; Thu, 25 Oct 2018 17:52:00 +0000 (UTC) Date: Thu, 25 Oct 2018 17:52:00 +0000 (UTC) From: "Arpit Agarwal (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (HDDS-543) datanode service start throwing NullPointerException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDDS-543?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal resolved HDDS-543. -------------------------------- Resolution: Done This should be fixed as part of the restart fixes per [~msingh]. > datanode service start throwing NullPointerException > ---------------------------------------------------- > > Key: HDDS-543 > URL: https://issues.apache.org/jira/browse/HDDS-543 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Reporter: Nilotpal Nandi > Assignee: Mukul Kumar Singh > Priority: Critical > Attachments: all-node-ozone-logs-1537784799.tar.gz > > > steps taken : > ------------------- > # Stopped one of the nodes in 6 node cluster=C2=A0 while write operation= was running. > # Tried to start datanode after sometime. Datanode failed to start > =C2=A0 > ozone version : > --------------------- > =C2=A0 > {noformat} > Source code repository git@github.com:apache/hadoop.git -r 968082ffa5d9e5= 0ed8538f653c375edd1b8feea5 > Compiled by elek on 2018-09-19T20:57Z > Compiled with protoc 2.5.0 > From source with checksum efbdeabb5670d69d9efde85846e4ee98 > Using HDDS 0.2.1-alpha > Source code repository git@github.com:apache/hadoop.git -r 968082ffa5d9e5= 0ed8538f653c375edd1b8feea5 > Compiled by elek on 2018-09-19T20:56Z > Compiled with protoc 2.5.0 > From source with checksum 8bf78cff4b73c95d486da5b21053ef > =C2=A0 > {noformat} > =C2=A0 > console log : > -------------------- > =C2=A0 > {noformat} > 2018-09-24 09:53:54,913 INFO impl.RaftServerImpl: 9348f4cb-0bd3-41e7-b1c6= -c22463a20c09: set configuration 0: [0ba4abe0-c8d1-4d6c-8d28-ebfa08fb84c9:1= 72.27.18.18:9858, 3d6c8b2c-f1d2-4425-a577-b9ef5dc7c634:172.27.80.23:9858, 9= 348f4cb-0bd3-41e7-b1c6-c22463a20c09:172.27.23.161:9858], old=3Dnull at 0 > Sep 24, 2018 9:53:55 AM com.sun.jersey.server.impl.application.WebApplica= tionImpl _initiate > INFO: Initiating Jersey application, version 'Jersey: 1.19 02/11/2015 03:= 25 AM' > 2018-09-24 09:53:55,968 [main] INFO - Listening HDDS REST traffic on /0.0= .0.0:9880 > 2018-09-24 09:53:55,969 [main] INFO - Started plug-in org.apache.hadoop.o= zone.web.OzoneHddsDatanodeService@2fa3be26 > 2018-09-24 09:53:55,975 [Datanode State Machine Thread - 0] INFO - Attemp= ting to start container services. > 2018-09-24 09:53:56,171 [Datanode State Machine Thread - 0] INFO - Starti= ng XceiverServerRatis 9348f4cb-0bd3-41e7-b1c6-c22463a20c09 at port 9858 > 2018-09-24 09:53:56,193 [Datanode State Machine Thread - 0] ERROR - Unabl= e to start the DatanodeState Machine > java.io.IOException: Premature EOF from inputStream > at org.apache.ratis.util.IOUtils.readFully(IOUtils.java:100) > at org.apache.ratis.server.storage.LogReader.decodeEntry(LogReader.java:= 250) > at org.apache.ratis.server.storage.LogReader.readEntry(LogReader.java:15= 5) > at org.apache.ratis.server.storage.LogInputStream.nextEntry(LogInputStre= am.java:128) > at org.apache.ratis.server.storage.LogSegment.readSegmentFile(LogSegment= .java:110) > at org.apache.ratis.server.storage.LogSegment.loadSegment(LogSegment.jav= a:132) > at org.apache.ratis.server.storage.RaftLogCache.loadSegment(RaftLogCache= .java:110) > at org.apache.ratis.server.storage.SegmentedRaftLog.loadLogSegments(Segm= entedRaftLog.java:155) > at org.apache.ratis.server.storage.SegmentedRaftLog.open(SegmentedRaftLo= g.java:123) > at org.apache.ratis.server.impl.ServerState.initLog(ServerState.java:162= ) > at org.apache.ratis.server.impl.ServerState.(ServerState.java:110) > at org.apache.ratis.server.impl.RaftServerImpl.(RaftServerImpl.jav= a:106) > at org.apache.ratis.server.impl.RaftServerProxy.lambda$newRaftServerImpl= $0(RaftServerProxy.java:191) > at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFut= ure.java:1590) > at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFu= ture.java:1582) > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java= :1056) > at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.ja= va:157) > 2018-09-24 09:53:56,195 [main] ERROR - Exception in HddsDatanodeService. > java.lang.NullPointerException > at org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMa= chine.join(DatanodeStateMachine.java:332) > at org.apache.hadoop.ozone.HddsDatanodeService.join(HddsDatanodeService.= java:191) > at org.apache.hadoop.ozone.HddsDatanodeService.main(HddsDatanodeService.= java:250) > 2018-09-24 09:53:56,198 INFO util.ExitUtil: Exiting with status 1: java.l= ang.NullPointerException > 2018-09-24 09:53:56,201 WARN fs.CachingGetSpaceUsed: Thread Interrupted w= aiting to refresh disk information: sleep interrupted > 2018-09-24 09:53:56,201 [shutdown-hook-0] INFO - SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down HddsDatanodeService at ctr-e138-1518143905142= -468367-01-000003.hwx.site/172.27.23.161 > =C2=A0 > {noformat} > =C2=A0 > datanode.log > ---------------------- > {noformat} > 2018-09-24 09:42:41,305 ERROR org.apache.ratis.server.impl.RaftServerImpl= : 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: applyTransaction failed for index:2= proto:(t:2, i:2)SMLOGENTRY, client-DBCE827E62E6, cid=3D1 > 2018-09-24 09:42:41,307 ERROR org.apache.ratis.server.impl.StateMachineUp= dater: Terminating with exit status 2: StateMachineUpdater-9348f4cb-0bd3-41= e7-b1c6-c22463a20c09: the StateMachineUpdater hits Throwable > java.lang.NullPointerException > at org.apache.hadoop.ozone.container.common.transport.server.ratis.Conta= inerStateMachine$StateMachineHelper.handleChunkCommit(ContainerStateMachine= .java:561) > at org.apache.hadoop.ozone.container.common.transport.server.ratis.Conta= inerStateMachine$StateMachineHelper.executeContainerCommand(ContainerStateM= achine.java:615) > at org.apache.hadoop.ozone.container.common.transport.server.ratis.Conta= inerStateMachine.applyTransaction(ContainerStateMachine.java:394) > at org.apache.ratis.server.impl.RaftServerImpl.applyLogToStateMachine(Ra= ftServerImpl.java:1150) > at org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpda= ter.java:148) > at java.lang.Thread.run(Thread.java:745) > 2018-09-24 09:42:41,312 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Th= read Interrupted waiting to refresh disk information: sleep interrupted > 2018-09-24 09:53:38,901 INFO org.apache.hadoop.metrics2.impl.MetricsConfi= g: Loaded properties from hadoop-metrics2.properties > 2018-09-24 09:53:39,013 INFO org.apache.hadoop.metrics2.impl.MetricsSyste= mImpl: Scheduled Metric snapshot period at 10 second(s). > 2018-09-24 09:53:39,013 INFO org.apache.hadoop.metrics2.impl.MetricsSyste= mImpl: HddsDatanode metrics system started > 2018-09-24 09:53:39,056 INFO org.apache.hadoop.conf.Configuration.depreca= tion: No unit for ozone.scm.heartbeat.rpc-timeout(1000) assuming MILLISECON= DS > 2018-09-24 09:53:39,069 INFO org.apache.hadoop.conf.Configuration.depreca= tion: No unit for hdds.write.lock.reporting.threshold.ms(5000) assuming MIL= LISECONDS > 2018-09-24 09:53:39,069 INFO org.apache.hadoop.conf.Configuration.depreca= tion: No unit for hdds.lock.suppress.warning.interval.ms(10000) assuming MI= LLISECONDS > 2018-09-24 09:53:39,751 WARN org.apache.hadoop.hdds.scm.HddsServerUtil: S= torage directory for Ratis is not configured.Mapping Ratis storage under /d= ata/disk1/ozone/meta/ratis. It is a good idea to map this to an SSD disk. > 2018-09-24 09:53:39,783 INFO org.apache.ratis.server.impl.RaftServerProxy= : raft.rpc.type =3D GRPC (default) > 2018-09-24 09:53:39,793 INFO org.apache.ratis.grpc.GrpcConfigKeys$Server:= raft.grpc.server.port =3D 9858 (custom) > 2018-09-24 09:53:39,793 INFO org.apache.ratis.grpc.RaftGRpcService: raft.= grpc.message.size.max =3D 167772160 (custom) > 2018-09-24 09:53:39,795 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.log.appender.buffer.capacity =3D 134217728 (custom) > 2018-09-24 09:53:39,795 INFO org.apache.ratis.grpc.RaftGRpcService: raft.= grpc.flow.control.window =3D 1MB (=3D1048576) (default) > 2018-09-24 09:53:39,796 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.rpc.request.timeout =3D 3000 ms (custom) > 2018-09-24 09:53:39,835 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.storage.dir =3D /data/disk1/ozone/meta/ratis (custom) > 2018-09-24 09:53:39,836 INFO org.apache.ratis.server.impl.RaftServerProxy= : 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: found a subdirectory /data/disk1/oz= one/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d > 2018-09-24 09:53:39,838 INFO org.apache.ratis.server.impl.RaftServerProxy= : 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: addNew group-85804B25606D:[] return= s group-85804B25606D:java.util.concurrent.CompletableFuture@74e28667[Not co= mpleted] > 2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.rpc.timeout.min =3D 1 s (custom) > 2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.rpc.timeout.max =3D 1200 ms (custom) > 2018-09-24 09:53:39,850 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.rpcslowness.timeout =3D 120 s (custom) > 2018-09-24 09:53:39,868 INFO org.apache.ratis.server.impl.RaftServerImpl:= 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: ConfigurationManager, init=3D-1: [],= old=3Dnull, confs=3D > 2018-09-24 09:53:39,868 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.storage.dir =3D /data/disk1/ozone/meta/ratis (custom) > 2018-09-24 09:53:39,873 INFO org.apache.ratis.server.storage.RaftStorageD= irectory: Lock on /data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-8580= 4b25606d/in_use.lock acquired by nodename 3792419@ctr-e138-1518143905142-46= 8367-01-000003.hwx.site > 2018-09-24 09:53:39,878 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.leader.election.timeout =3D 120 s (custom) > 2018-09-24 09:53:39,881 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.log.use.memory =3D false (default) > 2018-09-24 09:53:39,882 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.log.appender.buffer.capacity =3D 134217728 (custom) > 2018-09-24 09:53:39,883 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.log.segment.size.max =3D 1073741824 (custom) > 2018-09-24 09:53:39,886 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.log.segment.cache.num.max =3D 2 (custom) > 2018-09-24 09:53:39,889 INFO org.apache.ratis.server.storage.RaftLogWorke= r: new 9348f4cb-0bd3-41e7-b1c6-c22463a20c09-RaftLogWorker for Storage Direc= tory /data/disk1/ozone/meta/ratis/6ff6479b-11f4-456f-b46e-85804b25606d > 2018-09-24 09:53:39,889 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.log.segment.size.max =3D 1073741824 (custom) > 2018-09-24 09:53:39,889 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.log.preallocated.size =3D 134217728 (custom) > 2018-09-24 09:53:39,890 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.log.write.buffer.size =3D 33554432 (custom) > 2018-09-24 09:53:39,890 INFO org.apache.ratis.server.RaftServerConfigKeys= : raft.server.log.force.sync.num =3D 128 (default) > 2018-09-24 09:53:39,925 INFO org.apache.ratis.server.impl.RaftServerImpl:= 9348f4cb-0bd3-41e7-b1c6-c22463a20c09: set configuration 0: [0ba4abe0-c8d1-= 4d6c-8d28-ebfa08fb84c9:172.27.18.18:9858, 3d6c8b2c-f1d2-4425-a577-b9ef5dc7c= 634:172.27.80.23:9858, 9348f4cb-0bd3-41e7-b1c6-c22463a20c09:172.27.23.161:9= 858], old=3Dnull at 0 > 2018-09-24 09:53:41,058 INFO org.apache.hadoop.util.ExitUtil: Exiting wit= h status 1: java.lang.NullPointerException > 2018-09-24 09:53:41,065 WARN org.apache.hadoop.fs.CachingGetSpaceUsed: Th= read Interrupted waiting to refresh disk information: sleep interrupted{nof= ormat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org