hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7815) Loop on 'blocks does not belong to any file'
Date Thu, 19 Mar 2015 18:13:38 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14369836#comment-14369836
] 

Chris Nauroth commented on HDFS-7815:
-------------------------------------

Hi, [~frha].  You can add this line to your log4j.properties to suppress the block state change
logging:

{code}
log4j.logger.BlockStateChange=WARN
{code}

However, if you're running a distro based on Apache Hadoop 2.6.0, then that version has a
bug that accidentally changed the routing of these log messages.  This was fixed in HDFS-7425,
so subsequent versions won't have this problem.  If you're running that version and the above
doesn't work, then you can do this instead:

{code}
log4j.logger.org.apache.hadoop.hdfs.StateChange=WARN
{code}


> Loop on 'blocks does not belong to any file'
> --------------------------------------------
>
>                 Key: HDFS-7815
>                 URL: https://issues.apache.org/jira/browse/HDFS-7815
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 2.6.0
>         Environment: small cluster on RetHat. 2 namenodes (HA),  6 datanodes with 19TB
disk for hdfs.
>            Reporter: Frode Halvorsen
>
> I am currently experincing a looping situation;
> The namenode uses appx 1:50 (min:sec) to log a massive amount of lines stating that some
blocks don't belong to any file. During this time, it's unresponsive to any requests from
datanodes, and if the zoo-keper had been running, it would have taken the name-node down (ssh-fencing
: kill).
> When it has finished the 'round', it starts to do some normal work, and among other things,
telling the datanode to delete the blocks. But before the datanode has gotten around to delete
the blocks, and is about to report back to the namenode, the namenode  has stared on the next
round of reporing the same blocks that don't belong to anly file. Thus, the datanode gets
a timout when reporing block-updates for the deleted blocks, And this, of course repeats itself
over and over again... 
> There is actually two issues , I think,;
> 1- the namenode gets totally unresponsive when reporing the blocks (could this be a debug-line
instead of a INFO-line)
> 2 - the namenode seems to 'forget' that it has already reported those blocks just 2-3
minutes ago...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message