hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wei-Chiu Chuang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-9249) NPE thrown if an IOException is thrown in NameNode.<init>
Date Mon, 02 Nov 2015 22:55:27 GMT

     [ https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wei-Chiu Chuang updated HDFS-9249:
----------------------------------
    Attachment: HDFS-9249.004.patch

I had an offline discussion with [~yzhangal] and had learned a lot when it comes to supportability.

Rev4 address his comments:
Basically, do not ignore the null pointer, instead, catch NPE at its caller. This makes it
easier to understand the IOException and NPE are related. Also, added comments in the test
case to make it easier to understand.

> NPE thrown if an IOException is thrown in NameNode.<init>
> ---------------------------------------------------------
>
>                 Key: HDFS-9249
>                 URL: https://issues.apache.org/jira/browse/HDFS-9249
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.7.1
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Minor
>              Labels: supportability
>         Attachments: HDFS-9249.001.patch, HDFS-9249.002.patch, HDFS-9249.003.patch, HDFS-9249.004.patch
>
>
> This issue was found when running test case TestBackupNode.testCheckpointNode, but upon
closer look, the problem is not due to the test case.
> Looks like an IOException was thrown in
>     try {
>       initializeGenericKeys(conf, nsId, namenodeId);
>       initialize(conf);
>       try {
>         haContext.writeLock();
>         state.prepareToEnterState(haContext);
>         state.enterState(haContext);
>       } finally {
>         haContext.writeUnlock();
>       }
> causing the namenode to stop, but the namesystem was not yet properly instantiated, causing
NPE.
> I tried to reproduce locally, but to no avail.
> Because I could not reproduce the bug, and the log does not indicate what caused the
IOException, I suggest make this a supportability JIRA to log the exception for future improvement.
> Stacktrace
> java.lang.NullPointerException: null
> at org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906)
> at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:827)
> at org.apache.hadoop.hdfs.server.namenode.BackupNode.<init>(BackupNode.java:89)
> at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474)
> at org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102)
> at org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298)
> at org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130)
> The last few lines of log:
> 2015-10-14 19:45:07,807 INFO namenode.NameNode (NameNode.java:createNameNode(1422)) -
createNameNode [-checkpoint]
> 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:init(158))
- CheckpointNode metrics system started (again)
> 2015-10-14 19:45:07,808 INFO namenode.NameNode (NameNode.java:setClientNamenodeAddress(402))
- fs.defaultFS is hdfs://localhost:37835
> 2015-10-14 19:45:07,808 INFO namenode.NameNode (NameNode.java:setClientNamenodeAddress(422))
- Clients are to use localhost:37835 to access this namenode/service.
> 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdown(1708))
- Shutting down the Mini HDFS Cluster
> 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem (FSNamesystem.java:stopActiveServices(1298))
- Stopping services started for active state
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog (FSEditLog.java:endCurrentLogSegment(1228))
- Ending log segment 1
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller
was interrupted, exiting
> 2015-10-14 19:45:07,811 INFO namenode.FSEditLog (FSEditLog.java:printStatistics(703))
- Number of transactions: 3 Total time for transactions(ms): 0 Number of transactions batched
in Syncs: 0 Number of syncs: 4 SyncTimes(ms): 2 1 
> 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber
was interrupted, exiting
> 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142))
- Finalizing edits file /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000000001
-> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_0000000000000000001-0000000000000000003
> 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager (FileJournalManager.java:finalizeLogSegment(142))
- Finalizing edits file /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_0000000000000000001
-> /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_0000000000000000001-0000000000000000003
> 2015-10-14 19:45:07,836 INFO blockmanagement.CacheReplicationMonitor (CacheReplicationMonitor.java:run(169))
- Shutting down CacheReplicationMonitor
> 2015-10-14 19:45:07,836 INFO ipc.Server (Server.java:stop(2485)) - Stopping server on
37835
> 2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(718)) - Stopping IPC Server
listener on 37835
> 2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(844)) - Stopping IPC Server
Responder
> 2015-10-14 19:45:07,837 INFO blockmanagement.BlockManager (BlockManager.java:run(3781))
- Stopping ReplicationMonitor.
> 2015-10-14 19:45:07,838 WARN blockmanagement.DecommissionManager (DecommissionManager.java:run(78))
- Monitor interrupted: java.lang.InterruptedException: sleep interrupted
> 2015-10-14 19:45:07,844 INFO namenode.FSNamesystem (FSNamesystem.java:stopActiveServices(1298))
- Stopping services started for active state
> 2015-10-14 19:45:07,845 INFO namenode.FSNamesystem (FSNamesystem.java:stopStandbyServices(1386))
- Stopping services started for standby state
> 2015-10-14 19:45:07,848 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message