hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10512) VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
Date Wed, 29 Jun 2016 02:06:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15354213#comment-15354213

Yiqun Lin commented on HDFS-10512:

>From the code in {{FsDatasetImpl}}, I see that the method {{FsDatasetImpl#getVolume}}
returns null cause the NPE. In these code:
  public synchronized FsVolumeImpl getVolume(final ExtendedBlock b) {
    final ReplicaInfo r =  volumeMap.get(b.getBlockPoolId(), b.getLocalBlock());
    return r != null? (FsVolumeImpl)r.getVolume(): null;
So it means that the Replicainfo of corrupt block in volumeMap has been removed. And there
are many cases will trigger the operation {{volumeMap.remove}} in {{FsDatasetImpl}}. So I
want to say, the case that mentioned in HDFS-10587 will lead this, can you confirm this, [~jojochuang]?

> VolumeScanner may terminate to due NPE in DataNode.reportBadBlocks
> ------------------------------------------------------------------
>                 Key: HDFS-10512
>                 URL: https://issues.apache.org/jira/browse/HDFS-10512
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>            Reporter: Wei-Chiu Chuang
>            Assignee: Yiqun Lin
>         Attachments: HDFS-10512.001.patch, HDFS-10512.002.patch
> VolumeScanner may terminate due to unexpected NullPointerException thrown in {{DataNode.reportBadBlocks()}}.
This is different from HDFS-8850/HDFS-9190
> I observed this bug in a production CDH 5.5.1 cluster and the same bug still persist
in upstream trunk.
> {noformat}
> 2016-04-07 20:30:53,830 WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: Reporting
bad BP-1800173197- on /dfs/dn
> 2016-04-07 20:30:53,831 ERROR org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn,
DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting because of exception
> java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.reportBadBlocks(DataNode.java:1018)
>         at org.apache.hadoop.hdfs.server.datanode.VolumeScanner$ScanResultHandler.handle(VolumeScanner.java:287)
>         at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.scanBlock(VolumeScanner.java:443)
>         at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.runLoop(VolumeScanner.java:547)
>         at org.apache.hadoop.hdfs.server.datanode.VolumeScanner.run(VolumeScanner.java:621)
> 2016-04-07 20:30:53,832 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/dfs/dn,
DS-89b72832-2a8c-48f3-8235-48e6c5eb5ab3) exiting.
> {noformat}
> I think the NPE comes from the volume variable in the following code snippet. Somehow
the volume scanner know the volume, but the datanode can not lookup the volume using the block.
> {code}
> public void reportBadBlocks(ExtendedBlock block) throws IOException{
>     BPOfferService bpos = getBPOSForBlock(block);
>     FsVolumeSpi volume = getFSDataset().getVolume(block);
>     bpos.reportBadBlocks(
>         block, volume.getStorageID(), volume.getStorageType());
>   }
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message