hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1999) DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'
Date Mon, 08 Oct 2007 21:06:50 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533201

Konstantin Shvachko commented on HADOOP-1999:

finalize removes hard links previously created by upgrade. The removal is done in a separate
thread, but if there is a lot of blocks, 
then data-nodes are likely to be blocked on IOs, that is data transmission will be slow. This
is what you observed here. 
A solution would be to remove the links lazily, e.g. remove 100 files per second or so. Then
finalizing will go slower, but 
the data-nodes will be able to proceed with normal activities.

The jstack you attached: I do not see that data-node is doing any file deletes. Are you sure
this thread dump was done 
during finalize? I see that one of the threads is doing DU though. Could the slowdown be related
to HADOOP-1946?
Before this was fixed I've seen drastic slowdown of data-nodes, some of them would become
dead even with insignificant load. 
Finalize would make things even worse.

Missing blocks: I suspect that you get these because many io operation were not complete.
Some blocks were not replicated,
some files were not closed.

> DataNodes can become dead nodes when running 'dfsadmin finalizeUpgrade'
> -----------------------------------------------------------------------
>                 Key: HADOOP-1999
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1999
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.15.0
>         Environment: Sep 14 nightly build
>            Reporter: Christian Kunz
>            Priority: Critical
>         Attachments: jstack.datanode
> I restarted namenode with -upgrade option, started a few scripts running hadoop command
line utility to upload a few files into dfs, and ran at some time
> hadoop dfsadmin -finalizeUpgrade.
> At this time all the dfs clients I started before got stuck during block transmission.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message