hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: Unable to Find S3N Filesystem Hadoop 2.6
Date Wed, 22 Apr 2015 06:10:27 GMT
Hello Billy,

I think your experience indicates that our documentation is insufficient for discussing how
to configure and use the alternative file systems.  I filed issue HADOOP-11863 to track a
documentation enhancement.


Please feel free to watch that issue if you'd like to be informed as it makes progress.  Thank
you for reporting back to the thread after you had a solution.

Chris Nauroth

From: Billy Watson <williamrwatson@gmail.com<mailto:williamrwatson@gmail.com>>
Reply-To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Date: Monday, April 20, 2015 at 11:14 AM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6

We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since we are using hadoop-pseudo-distributed.

We added a property to the core-site.xml file:

    <description>Tell hadoop which class to use to access s3 URLs. This change became
necessary in hadoop 2.6.0</description>

And updated the classpath for mapreduce applications:

    <description>The classpath specifically for mapreduce jobs. This override is nec.
so that s3n URLs work on hadoop 2.6.0+</description>

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <williamrwatson@gmail.com<mailto:williamrwatson@gmail.com>>
Thanks, anyways. Anyone else run into this issue?

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <jaquilina@eagleeyet.net<mailto:jaquilina@eagleeyet.net>>
Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

On 20 Apr 2015, at 16:53, Billy Watson <williamrwatson@gmail.com<mailto:williamrwatson@gmail.com>>

This is an install on a CentOS 6 virtual machine used in our test environment. We use HDP
in staging and production and we discovered these issues while trying to build a new cluster
using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6.

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <jaquilina@eagleeyet.net<mailto:jaquilina@eagleeyet.net>>

One thing I think which i most likely missed completely is are you using an amazon EMR cluster
or something in house?

Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 16:21, Billy Watson wrote:

I appreciate the response. These JAR files aren't 3rd party. They're included with the Hadoop
distribution, but in Hadoop 2.6 they stopped being loaded by default and now they have to
be loaded manually, if needed.

Essentially the problem boils down to:

- need to access s3n URLs
- cannot access without including the tools directory
- after including tools directory in HADOOP_CLASSPATH, failures start happening later in job
- need to find right env variable (or shell script or w/e) to include jets3t & other JARs
needed to access s3n URLs (I think)

William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <jaquilina@eagleeyet.net<mailto:jaquilina@eagleeyet.net>>

you mention an environmental variable. the step before you specify the steps to run to get
to the result. you can specify a bash script that will allow you to put any 3rd party jar
files, for us we used esri, on the cluster and propagate them to all nodes in the cluster
as well. You can ping me off list if you need further help. Thing is I havent used pig but
my boss and coworker wrote the mappers and reducers. to get these jars to the entire cluster
was a super small and simple bash script.

Jonathan Aquilina
Founder Eagle Eye T

On 2015-04-20 15:17, Billy Watson wrote:


I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line without issue.
I have set some options in hadoop-env.sh to make sure all the S3 stuff for hadoop 2.6 is set
up correctly. (This was very confusing, BTW and not enough searchable documentation on changes
to the s3 stuff in hadoop 2.6 IMHO).

Anyways, when I run a pig job which accesses s3, it gets to 16%, does not fail in pig, but
rather fails in mapreduce with "Error: java.io.IOException: No FileSystem for scheme: s3n."

I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/ to
the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig job
will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for scheme
s3n" error.

I feel like at this point I just have to add the share/hadoop/tools/lib directory (and maybe
lib) to the right environment variable, but I can't figure out which environment variable
that should be.

I appreciate any help, thanks!!

Stack trace:
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native
Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

- Billy Watson


William Watson
Software Engineer
(904) 705-7056<tel:%28904%29%20705-7056> PCS

View raw message