hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun Suresh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MAPREDUCE-7101) Revisit behavior of JHS scan file behavior
Date Tue, 12 Jun 2018 21:05:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16510203#comment-16510203
] 

Arun Suresh edited comment on MAPREDUCE-7101 at 6/12/18 9:04 PM:
-----------------------------------------------------------------

Thanks [~tmarquardt].
The patch looks good to me. +1  The comment describing the new field in JHAdminConfig is wrong
- minor thing I can fix before committing.

Will wait till EOD before committing if anyone has issues with the patch.

Given that this patch retains the default behavior, and specific cloud deployments can choose
to always scan.
Maybe a pluggable FS specific scan is probably a better long term solution, but I agree with
[~leftnoteasy] and [~rohithsharma] that we should go ahead with the approach in this patch
to unblock.





was (Author: asuresh):
Thanks [~tmarquardt].
The patch looks good to me. +1  The comment describing the new field in JHAdminConfig is wrong
- minor thing I can fix before committing.

Given that this patch retains the default behavior, and specific cloud deployments can choose
to always scan.
Maybe a pluggable FS specific scan is probably a better long term solution, but I agree with
[~leftnoteasy] and [~rohithsharma] that we should go ahead with the approach in this patch
to unblock.




> Revisit behavior of JHS scan file behavior
> ------------------------------------------
>
>                 Key: MAPREDUCE-7101
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7101
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Wangda Tan
>            Assignee: Thomas Marquardt
>            Priority: Critical
>         Attachments: MAPREDUCE-7101.001.patch
>
>
> Currently, the JHS scan directory if the modification of *directory* changed: 
> {code} 
>     public synchronized void scanIfNeeded(FileStatus fs) {
>       long newModTime = fs.getModificationTime();
>       if (modTime != newModTime) {
>         <... omitted some logics ...>
>         // reset scanTime before scanning happens
>         scanTime = System.currentTimeMillis();
>         Path p = fs.getPath();
>         try {
>           scanIntermediateDirectory(p);
> {code}
> This logic relies on an assumption that, the directory's modification time will be updated
if a file got placed under the directory.
> However, the semantic of directory's modification time is not consistent in different
FS implementations. For example, MAPREDUCE-6680 fixed some issues of truncated modification
time. And HADOOP-12837 mentioned on S3, the directory's modification time is always 0.
> I think we need to revisit behavior of this logic to make it to more robustly work on
different file systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message