Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: error (athena.apache.org: local policy)
To: user@hadoop.apache.org
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
MIME-Version: 1.0
Content-Type: multipart/alternative;
 boundary="=_2d147c4c2b749508bed39b4702c331a0"
Date: Mon, 20 Apr 2015 15:58:28 +0200
From: Jonathan Aquilina <jaquilina@eagleeyet.net>
In-Reply-To: 
 <CA+XUwYyCFqD6gRrUT75o6LDP9PFa2Wsymzx6UCQKTJeRvM43Gg@mail.gmail.com>
References: 
 <CA+XUwYyCFqD6gRrUT75o6LDP9PFa2Wsymzx6UCQKTJeRvM43Gg@mail.gmail.com>
Message-ID: <e2af46552264799fc7691fe7fe90b930@mail.eagleeyet.net>
User-Agent: Roundcube Webmail/1.0.5

--=_2d147c4c2b749508bed39b4702c331a0
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII

 

you mention an environmental variable. the step before you specify the
steps to run to get to the result. you can specify a bash script that
will allow you to put any 3rd party jar files, for us we used esri, on
the cluster and propagate them to all nodes in the cluster as well. You
can ping me off list if you need further help. Thing is I havent used
pig but my boss and coworker wrote the mappers and reducers. to get
these jars to the entire cluster was a super small and simple bash
script. 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote: 

> Hi,
> 
> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation on changes to the s3 stuff in hadoop 2.6 IMHO).
> 
> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n." 
> 
> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" error.
> 
> I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe lib) to the right environment variable, but I can't figure out which environment variable that should be.
> 
> I appreciate any help, thanks!!
> 
> Stack trace:
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 
> -- Billy Watson
> 
> -- 
> 
> William Watson
> Software Engineer 
> (904) 705-7056 PCS
 
--=_2d147c4c2b749508bed39b4702c331a0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset=UTF-8

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN">
<html><body style=3D'font-size: 10pt; font-family: Verdana,Geneva,sans-seri=
f'>
<p>you mention an environmental variable. the step before you specify the s=
teps to run to get to the result. you can specify a bash script that will a=
llow you to put any 3rd party jar files, for us we used esri, on the cluste=
r and propagate them to all nodes in the cluster as well. You can ping me o=
ff list if you need further help. Thing is I havent used pig but my boss an=
d coworker wrote the mappers and reducers. to get these jars to the entire =
cluster was a super small and simple bash script.</p>
<p>&nbsp;</p>
<div>
<pre>---<br />Regards,
Jonathan Aquilina
Founder Eagle Eye T</pre>
</div>
<p>On 2015-04-20 15:17, Billy Watson wrote:</p>
<blockquote type=3D"cite" style=3D"padding-left:5px; border-left:#1010ff 2p=
x solid; margin-left:5px"><!-- html ignored --><!-- head ignored --><!-- me=
ta ignored -->
<p>Hi,<br /><br />I am able to run a `hadoop fs -ls s3n://my-s3-bucket` fro=
m the command line without issue. I have set some options in hadoop-env.sh =
to make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This was=
 very confusing, BTW and not enough searchable documentation on changes to =
the s3 stuff in hadoop 2.6 IMHO).<br /><br />Anyways, when I run a pig job =
which accesses s3, it gets to 16%, does not fail in pig, but rather fails i=
n mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n=
=2E" <br /><br />I have added [hadoop-install-loc]/lib and [hadoop-install-=
loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env variable in hadoop=
-env.sh.erb. When I do not do this, the pig job will fail at 0% (before it =
ever gets to mapreduce) with a very similar "No fileystem for scheme s3n" e=
rror.<br /><br />I feel like at this point I just have to add the share/had=
oop/tools/lib directory (and maybe lib) to the right environment variable, =
but I can't figure out which environment variable that should be.<br /><br =
/>I appreciate any help, thanks!!<br /><br /><br />Stack trace:<br />org.ap=
ache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org=
=2Eapache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at or=
g.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache=
=2Ehadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apac=
he.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoo=
p.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFi=
leSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputF=
ormat.setInputPaths(FileInputFormat.java:498) at org.apache.hadoop.mapreduc=
e.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467) at org=
=2Eapache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage=
=2Ejava:609) at org.apache.pig.backend.hadoop.executionengine.mapReduceLaye=
r.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129) at org.apa=
che.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.create=
RecordReader(PigInputFormat.java:103) at org.apache.hadoop.mapred.MapTask$N=
ewTrackingRecordReader.&lt;init&gt;(MapTask.java:512) at org.apache.hadoop=
=2Emapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapre=
d.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run=
(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native =
Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache=
=2Ehadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628=
) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)<br /><br /=
><br />&mdash; Billy Watson<br /><br />-- </p>
<div dir=3D"ltr">
<div>
<div dir=3D"ltr">William Watson<br />Software Engineer
<div>(904) 705-7056 PCS</div>
</div>
</div>
</div>
</blockquote>
</body></html>

--=_2d147c4c2b749508bed39b4702c331a0--