hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.<init>
Date Thu, 15 Oct 2015 18:48:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14959400#comment-14959400
] 

Wei-Chiu Chuang commented on HDFS-9249:
---------------------------------------

[~stevel@apache.org] Thanks for the suggestion.
The exception was thrown when auth is default (i.e. SIMPLE). I did what you suggested, and
instead of NPE at BackupNode, an IOException is thrown by NameNode, but unlike BackupNode.stop(),
NameNode.stop() checks if namesystem is null. Additionally, I looked further and found there
are other IOException possibilities at other places.

So I think in addition to logging the exception, BackupNode should also check for the null
pointer.

> NPE thrown if an IOException is thrown in NameNode.<init>
> ---------------------------------------------------------
>
>                 Key: HDFS-9249
>                 URL: https://issues.apache.org/jira/browse/HDFS-9249
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Minor
>              Labels: supportability
>
> This issue was found when running test case TestBackupNode.testCheckpointNode, but upon
closer look, the problem is not due to the test case.
> Looks like an IOException was thrown in
>     try {
>       initializeGenericKeys(conf, nsId, namenodeId);
>       initialize(conf);
>       try {
>         haContext.writeLock();
>         state.prepareToEnterState(haContext);
>         state.enterState(haContext);
>       } finally {
>         haContext.writeUnlock();
>       }
> causing the namenode to stop, but the namesystem was not yet properly instantiated, causing
NPE.
> I tried to reproduce locally, but to no avail.
> Because I could not reproduce the bug, and the log does not indicate what caused the
IOException, I suggest make this a supportability JIRA to log the exception for future improvement.
> Stacktrace
> java.lang.NullPointerException: null
> at org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906)
> at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:827)
> at org.apache.hadoop.hdfs.server.namenode.BackupNode.<init>(BackupNode.java:89)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474)
> at org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102)
> at org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298)
> at org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130)
> The last few lines of log:
> 2015-10-14 19:45:07,807 INFO namenode.NameNode (NameNode.java:createNameNode(1422)) -
createNameNode [-checkpoint]
> 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:init(158))
- CheckpointNode metrics system started (again)
> 2015-10-14 19:45:07,808 INFO namenode.NameNode (NameNode.java:setClientNamenodeAddress(402))
- fs.defaultFS is hdfs://localhost:37835
> 2015-10-14 19:45:07,808 INFO namenode.NameNode (NameNode.java:setClientNamenodeAddress(422))
- Clients are to use localhost:37835 to access this namenode/service.
> 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1708))
- Shutting down the Mini HDFS Cluster
> 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem (FSNamesystem.java:stopActiveServices(1298))
- Stopping services started for active state
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog (FSEditLog.java:endCurrentLogSegment(1228))
- Ending log segment 1
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller
was interrupted, exiting
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(703))
- Number of transactions: 3 Total time for transactions(ms): 0 Number of transactions batched
in Syncs: 0 Number of syncs: 4 SyncTimes(ms): 2 1 
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber
was interrupted, exiting
> 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142))
- Finalizing edits file /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000000001
-> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_0000000000000000001-0000000000000000003
> 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142))
- Finalizing edits file /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_0000000000000000001
-> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_0000000000000000001-0000000000000000003
> 2015-10-14 19:45:07,836 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(169))
- Shutting down CacheReplicationMonitor
> 2015-10-14 19:45:07,836 INFO ipc.Server (Server.java:stop(2485)) - Stopping server on
37835
> 2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(718)) - Stopping IPC Server
listener on 37835
> 2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(844)) - Stopping IPC Server
Responder
> 2015-10-14 19:45:07,837 INFO blockmanagement.BlockManager (BlockManager.java:run(3781))
- Stopping ReplicationMonitor.
> 2015-10-14 19:45:07,838 WARN blockmanagement.DecommissionManager (DecommissionManager.java:run(78))
- Monitor interrupted: java.lang.InterruptedException: sleep interrupted
> 2015-10-14 19:45:07,844 INFO namenode.FSNamesystem (FSNamesystem.java:stopActiveServices(1298))
- Stopping services started for active state
> 2015-10-14 19:45:07,845 INFO namenode.FSNamesystem (FSNamesystem.java:stopStandbyServices(1386))
- Stopping services started for standby state
> 2015-10-14 19:45:07,848 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message