hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6380) AggregatedLogDeletionService will throw exception when there are some other directories in remote-log-dir
Date Wed, 20 Jul 2016 15:02:21 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385995#comment-15385995

Varun Saxena commented on MAPREDUCE-6380:

bq. Currently listStatus is checking against root log dir. So returned list is user log dirs
like /tmp/logs/userA, /tmp/logs/userB, /tmp/logs/userC.
Yes, the listStatus in FileDeletionTask#run does list against the root log dir. But I am not
talking about that.
I am talking about call to listStatus inside FileDeletionTask#deleteOldLogDirsFrom (the method
where actual deletion happens). Refer to code below based on latest trunk code in AggregatedLogDeletionService:Lines
82-87 (without applying your patch).
82. for(FileStatus userDir : fs.listStatus(remoteRootLogDir)) {
83.  if(userDir.isDirectory()) {
84.    Path userDirPath = new Path(userDir.getPath(), suffix);
85.    deleteOldLogDirsFrom(userDirPath, cutoffMillis, fs, rmClient);
86.  }
87. }
At line 84, userDirPath is created with the suffix(which is logs by default). This path will
be of the form {{/tmp/logs/userA/logs}}.
The change in the patch is to call FileStatus#listStatusIterator on this userDirPath to check
if the path exists or not (and continuing if FileNotFoundException is thrown).
But if you notice on line 85 in code snippet above, we call FileDeletionTask#deleteOldLogDirsFrom,
wherein we pass the same userDirPath which we had created on line 84 (i.e. path of the form

Now, the code in deleteOldLogDirsFrom is something like below.
95.  private static void deleteOldLogDirsFrom(Path dir, long cutoffMillis, 
96.      FileSystem fs, ApplicationClientProtocol rmClient) {
97.    try {
98.       for(FileStatus appDir : fs.listStatus(dir)) {
128.     }
129.   } catch (IOException e) {
130.     logIOException("Could not read the contents of " + dir, e);
131.   }

As you can see in this method, we again call {{listStatus}} on the passed dir which will be
userDirPath (i.e. path of the form {{/tmp/logs/userA/logs}}).
And this is where the FileNotFoundException is thrown (without your changes). And the code
does proceed and take on the next user dir because we do catch this FileNotFoundException
inside deleteOldLogDirsFrom. This JIRA is only about stack trace printed over and over again
I think which is why it is raised as trivial. The issue with app id format is a bigger issue

So my comment was that if we call listStatus already on userDirPath (paths like  {{/tmp/logs/userA/logs}})
inside deleteOldLogDirsFrom then what is the need of calling listStatusIterator on userDirPath
in run() method ? This would lead to an additional RPC call to Namenode in positive case (i.e.
when extraneous directories do not exist).

> AggregatedLogDeletionService will throw exception when there are some other directories
in remote-log-dir
> ---------------------------------------------------------------------------------------------------------
>                 Key: MAPREDUCE-6380
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6380
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>            Reporter: Zhang Wei
>            Assignee: Kai Sasaki
>            Priority: Trivial
>         Attachments: MAPREDUCE-6380.01.patch, MAPREDUCE-6380.02.patch, MAPREDUCE-6380.03.patch,
MAPREDUCE-6380.04.patch, MAPREDUCE-6380.05.patch, MAPREDUCE-6380.06.patch, MAPREDUCE-6380.07.patch
> AggregatedLogDeletionService will throw FileNotFoundException when there are some extraneous
directories put in remote-log-dir. The deletion function will try to listStatus against the
"extraneous-dir + suffix"  dir.  I think it would be better  if  the function can ignore these

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message