hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Parth Savani <pa...@sensenetworks.com>
Subject File Permissions on s3 FileSystem
Date Tue, 23 Oct 2012 17:32:39 GMT
Hello Everyone,
        I am trying to run a hadoop job with s3n as my filesystem.
I changed the following properties in my hdfs-site.xml

fs.default.name=s3n://KEY:VALUE@bucket/
mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp

When i run the job from ec2, I get the following error

The ownership on the staging directory
s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging
is not as expected. It is owned by   The directory must be owned by the
submitter ec2-user or by ec2-user
at
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)

I am using cloudera CDH4 hadoop distribution. The error is thrown from
JobSubmissionFiles.java class
 public static Path getStagingDir(JobClient client, Configuration conf)
  throws IOException, InterruptedException {
    Path stagingArea = client.getStagingAreaDir();
    FileSystem fs = stagingArea.getFileSystem(conf);
    String realUser;
    String currentUser;
    UserGroupInformation ugi = UserGroupInformation.getLoginUser();
    realUser = ugi.getShortUserName();
    currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
    if (fs.exists(stagingArea)) {
      FileStatus fsStatus = fs.getFileStatus(stagingArea);
      String owner = fsStatus.*getOwner();*
      if (!(owner.equals(currentUser) || owner.equals(realUser))) {
         throw new IOException("*The ownership on the staging directory " +*
*                      stagingArea + " is not as expected. " + *
*                      "It is owned by " + owner + ". The directory must " +
*
*                      "be owned by the submitter " + currentUser + " or " +
*
*                      "by " + realUser*);
      }
      if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) {
        LOG.info("Permissions on staging directory " + stagingArea + " are
" +
          "incorrect: " + fsStatus.getPermission() + ". Fixing permissions
" +
          "to correct value " + JOB_DIR_PERMISSION);
        fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
      }
    } else {
      fs.mkdirs(stagingArea,
          new FsPermission(JOB_DIR_PERMISSION));
    }
    return stagingArea;
  }



I think my job calls getOwner() which returns NULL since s3 does not have
file permissions which results in the IO exception that i am getting.

Any workaround for this? Any idea how i could you s3 as the filesystem with
hadoop on distributed mode?

Mime
View raw message