hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11284) [SPS]: Avoid running SPS under safemode and fix issues in target node choosing.
Date Tue, 03 Jan 2017 22:37:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796425#comment-15796425

Uma Maheswara Rao G commented on HDFS-11284:

HI [~yuanbo], Retry will not happen in DN itself. 
When DN send movement result as failure, NN will take care to retry. That time, NN find all
existing blocks satisfied, then that items will be ignore to send for movement. If still need
satisfaction, then it will send by finding new src,targets again. The default time for retry
was 30mins. (Higher timeout made because, some times DN itself takes longer time to send results
back due to low process nodes, then NN unnecessarily go for retry. This can be refined more
on testing)
Hope this helps you understand better.

Agree, I will go back to HDFS-11150. Since #2 has been addressed, the last issue seems belong
to retry mechanism. I'm thinking about removing/changing this JIRA.
Please keep this JIRA open, until you agree on the reason.
Can you confirm one point from your logs that whether the block was deleted due to over replication
and used the same node for movement(as movement was scheduled before)? If thats the case,
behavior should be fine. Also can you confirm remaining block movements were successful (by
looking at logs)?
Any way, go ahead with HDFS-11150 please. There were some test failure related to that, can
you please check?

Thanks a lot for putting efforts. 


> [SPS]: Avoid running SPS under safemode and fix issues in target node choosing.
> -------------------------------------------------------------------------------
>                 Key: HDFS-11284
>                 URL: https://issues.apache.org/jira/browse/HDFS-11284
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Yuanbo Liu
>            Assignee: Yuanbo Liu
>         Attachments: TestSatisfier.java
> Recently I've found in some conditions, SPS is not stable:
> * SPS runs under safe mode.
> * There're some overlap nodes in the chosen target nodes.
> * The real replication number of block doesn't match the replication factor. For example,
the real replication is 2 while the replication factor is 3.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message