hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "amith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3332) NullPointerException in DN when directoryscanner is trying to report bad blocks
Date Mon, 30 Apr 2012 10:30:48 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13264867#comment-13264867
] 

amith commented on HDFS-3332:
-----------------------------

Hi Nicholas,
Please correct me if I am wrong :)

I have NN started with HA configuration(nn1=40.95 and nn2=40.96 nn2 not started).

I have started only 1 NN and made it as active, wrote a file and corrupted it manually.
Directory scanner is reporting the bad block to all the NN via BPServiceActor.

Here BPServiceActor#reportBadBlocks(ExtendedBlock block) will not check whether DN is correctly
registered to NN.
We are trying to report bad blocks using bpRegistration(which is null) causing NPE.
{code}
 void reportBadBlocks(ExtendedBlock block) {
    DatanodeInfo[] dnArr = { new DatanodeInfo(bpRegistration) };
    LocatedBlock[] blocks = { new LocatedBlock(block, dnArr) }; 
{code}    


Why bpRegistration is null?

{code}
private void connectToNNAndHandshake() throws IOException {
    // get NN proxy
    bpNamenode = dn.connectToNN(nnAddr);

    // First phase of the handshake with NN - get the namespace
    // info.
    NamespaceInfo nsInfo = retrieveNamespaceInfo();
    
    // Verify that this matches the other NN in this HA pair.
    // This also initializes our block pool in the DN if we are
    // the first NN connection for this BP.
    bpos.verifyAndSetNamespaceInfo(nsInfo);
    
    // Second phase of the handshake with the NN.
    register();
  }
{code}

Here in register() call bpRegistration is assigned. Since retrieveNamespaceInfo() is like
a infinite loop trying to get the version

{code}
NamespaceInfo retrieveNamespaceInfo() throws IOException {
    NamespaceInfo nsInfo = null;
    while (shouldRun()) {
      try {
        nsInfo = bpNamenode.versionRequest();
        LOG.debug(this + " received versionRequest response: " + nsInfo);
        break;
      } catch(SocketTimeoutException e) {  // namenode is busy
        LOG.warn("Problem connecting to server: " + nnAddr);
      } catch(IOException e ) {  // namenode is not available
        LOG.warn("Problem connecting to server: " + nnAddr);
      }
      
      // try again in a second
      sleepAndLogInterrupts(5000, "requesting version info from NN");
    }
    
    if (nsInfo != null) {
      checkNNVersion(nsInfo);
    } else {
      throw new IOException("DN shut down before block pool connected");
    }
    return nsInfo;
  }
{code}

so bpRegistration is not assigned.

                
> NullPointerException in DN when directoryscanner is trying to report bad blocks
> -------------------------------------------------------------------------------
>
>                 Key: HDFS-3332
>                 URL: https://issues.apache.org/jira/browse/HDFS-3332
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 3.0.0
>         Environment: HDFS
>            Reporter: amith
>            Assignee: amith
>             Fix For: 3.0.0
>
>
> There is 1 NN and 1 DN (NN is started with HA conf)
> I corrupted 1 block and found 
> {code}
> 2012-04-27 09:59:01,214 INFO  datanode.DataNode (BPServiceActor.java:blockReport(401))
- BlockReport of 2 blocks took 0 msec to generate and 5 msecs for RPC and NN processing
> 2012-04-27 09:59:01,214 INFO  datanode.DataNode (BPServiceActor.java:blockReport(420))
- sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@3b756db3
> 2012-04-27 09:59:01,726 INFO  datanode.DirectoryScanner (DirectoryScanner.java:scan(390))
- BlockPool BP-2087868617-10.18.40.95-1335500488012 Total blocks: 2, missing metadata files:0,
missing block files:0, missing blocks in memory:0, mismatched blocks:1
> 2012-04-27 09:59:01,727 WARN  impl.FsDatasetImpl (FsDatasetImpl.java:checkAndUpdate(1366))
- Updating size of block -4466699320171028643 from 1024 to 1034
> 2012-04-27 09:59:01,727 WARN  impl.FsDatasetImpl (FsDatasetImpl.java:checkAndUpdate(1374))
- Reporting the block blk_-4466699320171028643_1004 as corrupt due to length mismatch
> 2012-04-27 09:59:01,728 DEBUG ipc.Client (Client.java:sendParam(807)) - IPC Client (1957050620)
connection to /10.18.40.95:8020 from root sending #257
> 2012-04-27 09:59:01,730 DEBUG ipc.Client (Client.java:receiveResponse(848)) - IPC Client
(1957050620) connection to /10.18.40.95:8020 from root got value #257
> 2012-04-27 09:59:01,730 DEBUG ipc.ProtobufRpcEngine (ProtobufRpcEngine.java:invoke(193))
- Call: reportBadBlocks 2
> 2012-04-27 09:59:01,731 ERROR datanode.DirectoryScanner (DirectoryScanner.java:run(288))
- Exception during DirectoryScanner execution - will continue next cycle
> java.lang.NullPointerException
> 	at org.apache.hadoop.hdfs.protocol.DatanodeID.<init>(DatanodeID.java:66)
> 	at org.apache.hadoop.hdfs.protocol.DatanodeInfo.<init>(DatanodeInfo.java:87)
> 	at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.reportBadBlocks(BPServiceActor.java:238)
> 	at org.apache.hadoop.hdfs.server.datanode.BPOfferService.reportBadBlocks(BPOfferService.java:187)
> 	at org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:559)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.checkAndUpdate(FsDatasetImpl.java:1377)
> 	at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:318)
> 	at org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:284)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> 	at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
> 	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:619)
> {code}
> Here when Directory scanner is trying to report badblock we got a NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message