hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-1778) TestFSRMStateStore fails on trunk
Date Tue, 03 Feb 2015 19:56:35 GMT

    [ https://issues.apache.org/jira/browse/YARN-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303866#comment-14303866
] 

zhihai xu commented on YARN-1778:
---------------------------------

Hi [~jlowe], thanks for your information. I think HDFS clients retry will have a lot of corner
case to cover, it may not be easy to cover all these cases . For example In YARN-2820, we
hit the issue:HDFS IOException after HDFS client retry at dfsClient.namenode.complete which
is the sub-function(low level) retry in FileSystemRMStateStore#updateFile in the following
log.
{code}
2014-10-29 23:49:12,202 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
Updating info for attempt: appattempt_1409135750325_109118_000001 at: 
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_000001

2014-10-29 23:49:19,495 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_000001.new.tmp retrying...

2014-10-29 23:49:23,757 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_000001.new.tmp retrying...

2014-10-29 23:49:31,120 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete
/tmp/hadoop-yarn/yarn/system/rmstore/FSRMStateRoot/RMAppRoot/application_1409135750325_109118/
appattempt_1409135750325_109118_000001.new.tmp retrying...

2014-10-29 23:49:46,283 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore:
Error updating info for attempt: appattempt_1409135750325_109118_000001
java.io.IOException: Unable to close file because the last block does not have enough number
of replicas.
2014-10-29 23:49:46,284 ERROR org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore:
Error storing/updating appAttempt: appattempt_1409135750325_109118_000001
2014-10-29 23:49:46,916 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Received a org.apache.hadoop.yarn.server.resourcemanager.RMFatalEvent of type STATE_STORE_OP_FAILED.
{code}
HDFS client is low level retry. It doesn't know how the upper layer use it. IMO, It make senses
to do the retry in the upper layer for the whole functionality retry, which is similar as
doing the retry at different network layers: retry at physical layer, link layer and TCP/IP
layer.



> TestFSRMStateStore fails on trunk
> ---------------------------------
>
>                 Key: YARN-1778
>                 URL: https://issues.apache.org/jira/browse/YARN-1778
>             Project: Hadoop YARN
>          Issue Type: Test
>            Reporter: Xuan Gong
>            Assignee: zhihai xu
>         Attachments: YARN-1778.000.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message