Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 52E64DB59 for ; Thu, 25 Oct 2012 19:53:23 +0000 (UTC) Received: (qmail 33650 invoked by uid 500); 25 Oct 2012 19:53:18 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 33436 invoked by uid 500); 25 Oct 2012 19:53:18 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 33429 invoked by uid 99); 25 Oct 2012 19:53:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 19:53:18 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.216.48] (HELO mail-qa0-f48.google.com) (209.85.216.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 19:53:12 +0000 Received: by mail-qa0-f48.google.com with SMTP id c11so2626473qad.14 for ; Thu, 25 Oct 2012 12:52:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=taoaP5CEBMQtNT0F3/s8GR2JSMCVduhu3OP+Wagwa54=; b=ObNCN4L65vL3hMIRe7gCNoxRupCqjVWVPwpQlU54BIcaC5zSixTspuvXzHjcLTeIWU mRlOKC86vyZJZmjdygT9eJeGk3+sBBxsypbgfyJeWpbWyZh1YbN20qReKp0bVcpbE6kv QYx8UIfKQCBdTrah6Ilf3ddAiA5PA5q+9e8Xkgt4/JuKsLSYlq7jrqxLtdLa/TIwL7SV PLeU8QPGz+MUGZVyE+EWt9pR0gD7HCZNWFNi15POw+IR/Oz2W+xoIsgPpStg0Cy6Dn5w 2UX+/9QbEO/FkTVL8xB3au1OE/J9zwsWV6kf+z5RLC/vhnIoW024jVi5WFpncz2nmcMV w+xg== MIME-Version: 1.0 Received: by 10.229.134.193 with SMTP id k1mr2389028qct.2.1351194771376; Thu, 25 Oct 2012 12:52:51 -0700 (PDT) Received: by 10.49.35.44 with HTTP; Thu, 25 Oct 2012 12:52:51 -0700 (PDT) In-Reply-To: References: Date: Thu, 25 Oct 2012 15:52:51 -0400 Message-ID: Subject: Re: File Permissions on s3 FileSystem From: Parth Savani To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00248c7118fd5ec95404cce78c8c X-Gm-Message-State: ALoCoQnqExxQC58OOcWmyaG0rG89uljriVHYgSL1ym4RVM9bOO/g+5CRoaaa2qtfa0Ql/e92gLz+ X-Virus-Checked: Checked by ClamAV on apache.org --00248c7118fd5ec95404cce78c8c Content-Type: text/plain; charset=ISO-8859-1 Hello Harsh, I am following steps based on this link: http://wiki.apache.org/hadoop/AmazonS3 When i run the job, I am seeing that the hadoop places all the jars required for the job on s3. However, when it tries to run the job, it complains The ownership on the staging directory s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is owned by The directory must be owned by the submitter ec2-user or by ec2-user Some people have seemed to solved this problem of permissions here -> https://issues.apache.org/jira/browse/HDFS-1333 But they have made changes to some hadoop java classes and I wonder if there's an easy workaround. On Wed, Oct 24, 2012 at 12:21 AM, Harsh J wrote: > Hey Parth, > > I don't think its possible to run MR by basing the FS over S3 > completely. You can use S3 for I/O for your files, but your > fs.default.name (or fs.defaultFS) must be either file:/// or hdfs:// > filesystems. This way, your MR framework can run/distribute its files > well, and also still be able to process S3 URLs passed as input or > output locations. > > On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani > wrote: > > Hello Everyone, > > I am trying to run a hadoop job with s3n as my filesystem. > > I changed the following properties in my hdfs-site.xml > > > > fs.default.name=s3n://KEY:VALUE@bucket/ > > mapreduce.jobtracker.staging.root.dir=s3n://KEY:VALUE@bucket/tmp > > > > When i run the job from ec2, I get the following error > > > > The ownership on the staging directory > > s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is > owned > > by The directory must be owned by the submitter ec2-user or by ec2-user > > at > > > org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:113) > > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850) > > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:415) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:844) > > at org.apache.hadoop.mapreduce.Job.submit(Job.java:481) > > > > I am using cloudera CDH4 hadoop distribution. The error is thrown from > > JobSubmissionFiles.java class > > public static Path getStagingDir(JobClient client, Configuration conf) > > throws IOException, InterruptedException { > > Path stagingArea = client.getStagingAreaDir(); > > FileSystem fs = stagingArea.getFileSystem(conf); > > String realUser; > > String currentUser; > > UserGroupInformation ugi = UserGroupInformation.getLoginUser(); > > realUser = ugi.getShortUserName(); > > currentUser = > UserGroupInformation.getCurrentUser().getShortUserName(); > > if (fs.exists(stagingArea)) { > > FileStatus fsStatus = fs.getFileStatus(stagingArea); > > String owner = fsStatus.getOwner(); > > if (!(owner.equals(currentUser) || owner.equals(realUser))) { > > throw new IOException("The ownership on the staging directory " > + > > stagingArea + " is not as expected. " + > > "It is owned by " + owner + ". The directory must > " + > > "be owned by the submitter " + currentUser + " or > " + > > "by " + realUser); > > } > > if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) { > > LOG.info("Permissions on staging directory " + stagingArea + " > are " > > + > > "incorrect: " + fsStatus.getPermission() + ". Fixing > permissions " > > + > > "to correct value " + JOB_DIR_PERMISSION); > > fs.setPermission(stagingArea, JOB_DIR_PERMISSION); > > } > > } else { > > fs.mkdirs(stagingArea, > > new FsPermission(JOB_DIR_PERMISSION)); > > } > > return stagingArea; > > } > > > > > > > > I think my job calls getOwner() which returns NULL since s3 does not have > > file permissions which results in the IO exception that i am getting. > > > > Any workaround for this? Any idea how i could you s3 as the filesystem > with > > hadoop on distributed mode? > > > > -- > Harsh J > --00248c7118fd5ec95404cce78c8c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hello Harsh,
=A0 =A0 =A0 =A0 =A0I am following steps based on this link= :=A0http://wiki.apache.o= rg/hadoop/AmazonS3

When i run the job, I am se= eing that the hadoop places all the jars required for the job on s3. Howeve= r, when it tries to run the job, it complains=A0
The ownership on the staging di= rectory s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is owned by =A0 The director= y must be owned by the submitter ec2-user or by ec2-user

Some people have seem= ed to solved this problem of permissions here -> =A0https://issues.apache.org/jira/brow= se/HDFS-1333
But they have made = changes to some hadoop java classes and I wonder if there's an easy wor= karound.=A0


On Wed, Oct 24, 2012 at 12:21 AM, Har= sh J <harsh@cloudera.com> wrote:
Hey Parth,

I don't think its possible to run MR by basing the FS over S3
completely. You can use S3 for I/O for your files, but your
fs.default.name (o= r fs.defaultFS) must be either file:/// or hdfs://
filesystems. This way, your MR framework can run/distribute its files
well, and also still be able to process S3 URLs passed as input or
output locations.

On Tue, Oct 23, 2012 at 11:02 PM, Parth Savani <parth@sensenetworks.com> wrote:
> Hello Everyone,
> =A0 =A0 =A0 =A0 I am trying to run a hadoop job with s3n as my filesys= tem.
> I changed the following properties in my hdfs-site.xml
>
> fs.default.name=3Ds3n://KEY:VALUE@bucket/
> mapreduce.jobtracker.staging.root.dir=3Ds3n://KEY:VALUE@bucket/tmp
>
> When i run the job from ec2, I get the following error
>
> The ownership on the staging directory
> s3n://KEY:VALUE@bucket/tmp/ec2-user/.staging is not as expected. It is= owned
> by =A0 The directory must be owned by the submitter ec2-user or by ec2= -user
> at
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmis= sionFiles.java:113)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:844)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformat= ion.java:1232)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java= :844)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:481)
>
> I am using cloudera CDH4 hadoop distribution. The error is thrown from=
> JobSubmissionFiles.java class
> =A0public static Path getStagingDir(JobClient client, Configuration co= nf)
> =A0 throws IOException, InterruptedException {
> =A0 =A0 Path stagingArea =3D client.getStagingAreaDir();
> =A0 =A0 FileSystem fs =3D stagingArea.getFileSystem(conf);
> =A0 =A0 String realUser;
> =A0 =A0 String currentUser;
> =A0 =A0 UserGroupInformation ugi =3D UserGroupInformation.getLoginUser= ();
> =A0 =A0 realUser =3D ugi.getShortUserName();
> =A0 =A0 currentUser =3D UserGroupInformation.getCurrentUser().getShort= UserName();
> =A0 =A0 if (fs.exists(stagingArea)) {
> =A0 =A0 =A0 FileStatus fsStatus =3D fs.getFileStatus(stagingArea);
> =A0 =A0 =A0 String owner =3D fsStatus.getOwner();
> =A0 =A0 =A0 if (!(owner.equals(currentUser) || owner.equals(realUser))= ) {
> =A0 =A0 =A0 =A0 =A0throw new IOException("The ownership on the st= aging directory " +
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 stagingArea + " is no= t as expected. " +
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "It is owned by "= ; + owner + ". The directory must " +
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "be owned by the subm= itter " + currentUser + " or " +
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 "by " + realUser= );
> =A0 =A0 =A0 }
> =A0 =A0 =A0 if (!fsStatus.getPermission().equals(JOB_DIR_PERMISSION)) = {
> =A0 =A0 =A0 =A0 LOG.info("Permissions on staging directory "= + stagingArea + " are "
> +
> =A0 =A0 =A0 =A0 =A0 "incorrect: " + fsStatus.getPermission()= + ". Fixing permissions "
> +
> =A0 =A0 =A0 =A0 =A0 "to correct value " + JOB_DIR_PERMISSION= );
> =A0 =A0 =A0 =A0 fs.setPermission(stagingArea, JOB_DIR_PERMISSION);
> =A0 =A0 =A0 }
> =A0 =A0 } else {
> =A0 =A0 =A0 fs.mkdirs(stagingArea,
> =A0 =A0 =A0 =A0 =A0 new FsPermission(JOB_DIR_PERMISSION));
> =A0 =A0 }
> =A0 =A0 return stagingArea;
> =A0 }
>
>
>
> I think my job calls getOwner() which returns NULL since s3 does not h= ave
> file permissions which results in the IO exception that i am getting.<= br> >
> Any workaround for this? Any idea how i could you s3 as the filesystem= with
> hadoop on distributed mode?



--
Harsh J

--00248c7118fd5ec95404cce78c8c--