hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liuxiaoping (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6745) Job directories should be clean in staging directorg /tmp/hadoop-yarn/staging after MapReduce job finish successfully
Date Thu, 01 Sep 2016 11:52:20 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15455156#comment-15455156
] 

liuxiaoping commented on MAPREDUCE-6745:
----------------------------------------

When task failure but job is successful, that .staging dir shouldn't be keept. I think that
is good idea to add a parameter "mapreduce.tasks.files.preserve.failedjobs".

> Job directories should be clean in staging directorg /tmp/hadoop-yarn/staging after MapReduce
job finish successfully
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6745
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6745
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am
>    Affects Versions: 2.7.2
>         Environment: Suse 11 sp3
>            Reporter: liuxiaoping
>            Priority: Blocker
>
> If MapReduce client set mapreduce.task.files.preserve.failedtasks=true, temporary job
directory will not be deleted in staging directory /tmp/hadoop-yarn/staging.
> As time goes by, the job files are more and more, eventually lead to below exeception:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$MaxDirectoryItemExceededException):
> The directory item limit of /tmp/hadoop-yarn/staging/username/.staging is exceeded: limit=1048576
items=1048576
> 		at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:936)
> 		at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:981)
> 		at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.unprotectedMkdir(FSDirMkdirOp.java:237)
> 		at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createSingleDirectory(FSDirMkdirOp.java:191)
> 		at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createChildrenDirectories(FSDirMkdirOp.java:166)
> 		at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:97)
> 		at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3788)
> 		at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:986)
> 		at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:624)
> 		at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolProtos.$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:624)
> 		at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> 		at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:973)
> 		at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2088)
> 		at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2084)
> 		at java.security.auth.Subject.doAs(Subject.java:422)
> 		at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672)
> 		at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2082)
> 		
> 		
> The official instructions for the configuration mapreduce.task.files.preserve.failedtasks
is below:
>     Should the files for failed tasks be kept. This should only be used on jobs that
are failing, because the storage is never reclaimed. 
>     It also prevents the map outputs from being erased from the reduce directory as they
are consumed.
> 	
> According to the instructions, I think the temporary files for successful tasks shouldn't
be kept.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message