hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-14503) ThrottledAsyncChecker throws NPE during block pool initialization
Date Tue, 21 May 2019 06:20:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yiqun Lin updated HDFS-14503:
-----------------------------
    Description: 
ThrottledAsyncChecker throws NPE during block pool initialization. The error leads the block
pool registration failure.

The exception
{noformat}
2019-05-20 01:02:36,003 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected exception
in block pool Block pool <registering> (Datanode Uuid xxxxx) service to xx.xx.xx.xx/xx.xx.xx.xx
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$LastCheckResult.access$000(ThrottledAsyncChecker.java:211)
        at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker.schedule(ThrottledAsyncChecker.java:129)
        at org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker.checkAllVolumes(DatasetVolumeChecker.java:209)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3387)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1508)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:272)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:768)
        at java.lang.Thread.run(Thread.java:745)
{noformat}

Looks like this error due to {{WeakHashMap}} type map {{completedChecks}} has removed the
target entry while we still get that entry. Although we have done a check before we get it,
there is still a chance the entry is got as null. 

We met a corner case for this: A federation mode, two block pools in DN, {{ThrottledAsyncChecker}}
schedules two same health checks for same volume.
{noformat}
2019-05-20 01:02:36,000 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker:
Scheduling a check for /hadoop/2/hdfs/data/current
2019-05-20 01:02:36,000 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker:
Scheduling a check for /hadoop/2/hdfs/data/current
{noformat}
{{completedChecks}} cleans up the entry for one successful check after called {{completedChecks#get}}.
However, after this, another check we get the null.


  was:
ThrottledAsyncChecker throws NPE during block pool initialization. The error leads the block
pool registration failure.

The exception
{noformat}
2019-05-20 01:02:36,003 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected exception
in block pool Block pool <registering> (Datanode Uuid xxxxx) service to xx.xx.xx.xx/xx.xx.xx.xx
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$LastCheckResult.access$000(ThrottledAsyncChecker.java:211)
        at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker.schedule(ThrottledAsyncChecker.java:129)
        at org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker.checkAllVolumes(DatasetVolumeChecker.java:209)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3387)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1508)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:272)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:768)
        at java.lang.Thread.run(Thread.java:745)
{noformat}

Looks like this error due to {{WeakHashMap}} type map {{completedChecks}} has removed the
target entry while we still get that entry. Although we have done a check before we get it,
there is still a chance the entry is got as null. 

We met a corner case for this: A federation mode, two block pools in DN, {{ThrottledAsyncChecker}}
schedules two same health checks for same volume.
{noformat}
2019-05-20 01:02:36,000 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker:
Scheduling a check for /hadoop/2/hdfs/data/current
2019-05-20 01:02:36,000 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker:
Scheduling a check for /hadoop/2/hdfs/data/current
{noformat}




> ThrottledAsyncChecker throws NPE during block pool initialization 
> ------------------------------------------------------------------
>
>                 Key: HDFS-14503
>                 URL: https://issues.apache.org/jira/browse/HDFS-14503
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.3.0
>            Reporter: Yiqun Lin
>            Priority: Major
>
> ThrottledAsyncChecker throws NPE during block pool initialization. The error leads the
block pool registration failure.
> The exception
> {noformat}
> 2019-05-20 01:02:36,003 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Unexpected
exception in block pool Block pool <registering> (Datanode Uuid xxxxx) service to xx.xx.xx.xx/xx.xx.xx.xx
> java.lang.NullPointerException
>         at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker$LastCheckResult.access$000(ThrottledAsyncChecker.java:211)
>         at org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker.schedule(ThrottledAsyncChecker.java:129)
>         at org.apache.hadoop.hdfs.server.datanode.checker.DatasetVolumeChecker.checkAllVolumes(DatasetVolumeChecker.java:209)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.checkDiskError(DataNode.java:3387)
>         at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1508)
>         at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:319)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:272)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:768)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Looks like this error due to {{WeakHashMap}} type map {{completedChecks}} has removed
the target entry while we still get that entry. Although we have done a check before we get
it, there is still a chance the entry is got as null. 
> We met a corner case for this: A federation mode, two block pools in DN, {{ThrottledAsyncChecker}}
schedules two same health checks for same volume.
> {noformat}
> 2019-05-20 01:02:36,000 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker:
Scheduling a check for /hadoop/2/hdfs/data/current
> 2019-05-20 01:02:36,000 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker:
Scheduling a check for /hadoop/2/hdfs/data/current
> {noformat}
> {{completedChecks}} cleans up the entry for one successful check after called {{completedChecks#get}}.
However, after this, another check we get the null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message