hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4867) metaSave NPEs when there are invalid blocks in repl queue.
Date Fri, 31 May 2013 16:28:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13671624#comment-13671624

Kihwal Lee commented on HDFS-4867:

bq. I think I am seeing the case in branch-2; did your error look something like this?
Yes, that is the same bug.

bq. Also, are you able to successfully reproduce it by chance?
I first saw it happening in safe mode and then during a massive decommissiong. In the former,
ReplicationMonitor is not processing neededReplication queue, so these blocks are not thrown
away. In the latter case, it does run but couldn't get to those blocks in time, since it limits
the number of blocks it processes in one iteration.

Detecting this condition is simple, but we need to think about what to do with it. May be
it should throw them away like ReplicationMonitor would do, if running in a non-startup safemode.
Outside safemode, it could just report, since ReplicationMonitor will eventually do the job.
> metaSave NPEs when there are invalid blocks in repl queue.
> ----------------------------------------------------------
>                 Key: HDFS-4867
>                 URL: https://issues.apache.org/jira/browse/HDFS-4867
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 0.23.7, 2.0.4-alpha
>            Reporter: Kihwal Lee
>            Assignee: Ravi Prakash
> Since metaSave cannot get the inode holding a orphaned/invalid block, it NPEs and stops
generating further report. Normally ReplicationMonitor removes them quickly, but if the queue
is huge, it takes very long time. Also in safe mode, they stay.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message