hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jagdish Kewat (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12837) FileStatus.getModificationTime not working on S3
Date Fri, 26 Feb 2016 07:52:18 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168618#comment-15168618
] 

Jagdish Kewat commented on HADOOP-12837:
----------------------------------------

Hi [~cnauroth],

I have a path filter utility which takes Path as input and returns true if the modification
time of the give path is less than a specified time. Here's a method snippet for reference.
{code}
  @Override
  public boolean accept(Path path) {
    try {
      FileStatus fs = filesystem.getFileStatus(path);
      if (fs.getModificationTime() < this.date.getMillis()) {
        return true;
      }
    } catch (IOException e) {
      LOG.error(e.getMessage());
    }
    return false;
  }
{code}

The actual job takes all the paths for whom this returns true. Since the modification time
for S3 based paths is returned as 0 this method returns true for all the paths specified.
This results in processing unwanted data. This job doesn't fail. It just produces undesired
output.

Besides I have a use case where we create a backup of the directories by renaming them with
the timestamp of the modification time.
Also here the *filesystem* could be S3 or HDFS so need to find a generic solution.

A probably workaround I can think of is writing some dummy file like _SUCCESS in each of these
directories and then look for modification time of the file, however, that would be an added
effort.

Thanks,
Jagdish
 

> FileStatus.getModificationTime not working on S3
> ------------------------------------------------
>
>                 Key: HADOOP-12837
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12837
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>            Reporter: Jagdish Kewat
>
> Hi Team,
> We have observed an issue with the FileStatus.getModificationTime() API on S3 filesystem.
The method always returns 0.
> I googled for this however couldn't find any solution as such which would fit in my scheme
of things. S3FileStatus seems to be an option however I would be using this API on HDFS as
well as S3 both so can't go for it.
> I tried to run the job on:
> * Release label:emr-4.2.0
> * Hadoop distribution:Amazon 2.6.0
> * Hadoop Common jar: hadoop-common-2.6.0.jar
> Please advise if any patch or fix available for this.
> Thanks,
> Jagdish



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message