hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walter Su (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9313) Possible NullPointerException in BlockManager if no excess replica can be chosen
Date Tue, 27 Oct 2015 03:07:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14975592#comment-14975592
] 

Walter Su commented on HDFS-9313:
---------------------------------

I'm ok that adding a {{null}} check. However, I don't think it's enough to address the scenario
here, In the test case, you added 1 SSD + 3 DISKs. As you said in the patch,
{code}
1040	    // In this case,
1041	    // no replica can't be chosen as the excessive replica as
1042	    // chooseReplicasToDelete only considers storages[4] and storages[5] that
1043	    // are the same rack. But neither's storage type is SSD.
{code}
If we choose nothing, the replica on SSD won't be deleted. And I remember, {{Mover}} won't
do it neither, since the existings contains the expected. 

Instead of choosing nothing, we should choose the SSD, since the remaining 3 DISKs are already
on enough racks.

> Possible NullPointerException in BlockManager if no excess replica can be chosen
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-9313
>                 URL: https://issues.apache.org/jira/browse/HDFS-9313
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ming Ma
>            Assignee: Ming Ma
>         Attachments: HDFS-9313.patch
>
>
> HDFS-8647 makes it easier to reason about various block placement scenarios. Here is
one possible case where BlockManager won't be able to find the excess replica to delete: when
storage policy changes around the same time balancer moves the block. When this happens, it
will cause NullPointerException.
> {noformat}
> java.lang.NullPointerException
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy.adjustSetsWithChosenReplica(BlockPlacementPolicy.java:156)
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseReplicasToDelete(BlockPlacementPolicyDefault.java:978)
> {noformat}
> Note that it isn't found in any production clusters. Instead, it is found from new unit
tests. In addition, the issue has been there before HDFS-8647.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message