hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takenori Sato <ts...@cloudian.com>
Subject Re: Unable to Find S3N Filesystem Hadoop 2.6
Date Wed, 22 Apr 2015 07:06:33 GMT
Hi Billy, Chris,

Let me share a couple of my findings.

I believe this was introduced by HADOOP-10893,
which was introduced from 2.6.0(HDP2.2).

1. fs.s3n.impl

> We added a property to the core-site.xml file:

You don't need to explicitly set this. It has never been done so in
previous versions.

Take a look at FileSystem#loadFileSystem, which is called from
FileSystem#getFileSystemClass.
Subclasses of FileSystem are loaded automatically if they are available on
a classloader you care.

So you just need to make sure hadoop-aws.jar is on a classpath.

For file system shell, this is done in hadoop-env.sh,
while for a MR job, in mapreduce.application.classpath,
or for YARN, in yarn.application.classpath.

2. mapreduce.application.classpath

> And updated the classpath for mapreduce applications:

Note that it points to a distributed cache on the default HDP 2.2
distribution.

    <property>
        <name>mapreduce.application.classpath</name>

<value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
    </property>
* $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
hadoop-aws.jar(S3NFileSystem)

While on a vanilla hadoop, it looks like standard paths as yours.

    <property>
        <name>mapreduce.application.classpath</name>

<value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
    </property>

Thanks,
Sato

On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cnauroth@hortonworks.com>
wrote:

>  Hello Billy,
>
>  I think your experience indicates that our documentation is insufficient
> for discussing how to configure and use the alternative file systems.  I
> filed issue HADOOP-11863 to track a documentation enhancement.
>
>  https://issues.apache.org/jira/browse/HADOOP-11863
>
>  Please feel free to watch that issue if you'd like to be informed as it
> makes progress.  Thank you for reporting back to the thread after you had a
> solution.
>
>   Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
>   From: Billy Watson <williamrwatson@gmail.com>
> Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Date: Monday, April 20, 2015 at 11:14 AM
> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>
>   We found the correct configs.
>
>  This post was helpful, but didn't entirely work for us out of the box
> since we are using hadoop-pseudo-distributed.
> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>
>  We added a property to the core-site.xml file:
>
>    <property>
>     <name>fs.s3n.impl</name>
>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>     <description>Tell hadoop which class to use to access s3 URLs. This
> change became necessary in hadoop 2.6.0</description>
>   </property>
>
>  And updated the classpath for mapreduce applications:
>
>    <property>
>     <name>mapreduce.application.classpath</name>
>
> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>     <description>The classpath specifically for mapreduce jobs. This
> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>   </property>
>
>   William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <williamrwatson@gmail.com>
> wrote:
>
>> Thanks, anyways. Anyone else run into this issue?
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>> cluster with Amazon met
>>>
>>> Sent from my iPhone
>>>
>>> On 20 Apr 2015, at 16:53, Billy Watson <williamrwatson@gmail.com> wrote:
>>>
>>>   This is an install on a CentOS 6 virtual machine used in our test
>>> environment. We use HDP in staging and production and we discovered these
>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>> from Hadoop 2.4 to Hadoop 2.6.
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  One thing I think which i most likely missed completely is are you
>>>> using an amazon EMR cluster or something in house?
>>>>
>>>>
>>>>
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>>
>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>
>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>
>>>> Essentially the problem boils down to:
>>>>
>>>> - need to access s3n URLs
>>>> - cannot access without including the tools directory
>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>> happening later in job
>>>> - need to find right env variable (or shell script or w/e) to include
>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>
>>>>
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  you mention an environmental variable. the step before you specify
>>>>> the steps to run to get to the result. you can specify a bash script
that
>>>>> will allow you to put any 3rd party jar files, for us we used esri, on
the
>>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>>> ping me off list if you need further help. Thing is I havent used pig
but
>>>>> my boss and coworker wrote the mappers and reducers. to get these jars
to
>>>>> the entire cluster was a super small and simple bash script.
>>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> Regards,
>>>>> Jonathan Aquilina
>>>>> Founder Eagle Eye T
>>>>>
>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>>>> line without issue. I have set some options in hadoop-env.sh to make
sure
>>>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>>>> confusing, BTW and not enough searchable documentation on changes to
the s3
>>>>> stuff in hadoop 2.6 IMHO).
>>>>>
>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>
>>>>> I have added [hadoop-install-loc]/lib and
>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH
env
>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will
fail
>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>>> for scheme s3n" error.
>>>>>
>>>>> I feel like at this point I just have to add the
>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>> variable, but I can't figure out which environment variable that should
be.
>>>>>
>>>>> I appreciate any help, thanks!!
>>>>>
>>>>>
>>>>> Stack trace:
>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at
>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>> at
>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>> at
>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>> at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>> at
>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>> at
>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>
>>>>>
>>>>> — Billy Watson
>>>>>
>>>>> --
>>>>>  William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>>
>>>
>>
>

Mime
View raw message