hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billy Watson <williamrwat...@gmail.com>
Subject Re: Unable to Find S3N Filesystem Hadoop 2.6
Date Wed, 22 Apr 2015 13:02:49 GMT
Chris and Sato,

Thanks a bunch! I've been so swamped by these and other issues we've been
having in scrambling to upgrade our cluster that I forgot to file a bug. I
certainly complained aloud that the docs were insufficient, but I didn't do
anything to help the community so thanks a bunch for recognizing that and
helping me out!

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <tsato@cloudian.com> wrote:

> Hi Billy, Chris,
>
> Let me share a couple of my findings.
>
> I believe this was introduced by HADOOP-10893,
> which was introduced from 2.6.0(HDP2.2).
>
> 1. fs.s3n.impl
>
> > We added a property to the core-site.xml file:
>
> You don't need to explicitly set this. It has never been done so in
> previous versions.
>
> Take a look at FileSystem#loadFileSystem, which is called from
> FileSystem#getFileSystemClass.
> Subclasses of FileSystem are loaded automatically if they are available on
> a classloader you care.
>
> So you just need to make sure hadoop-aws.jar is on a classpath.
>
> For file system shell, this is done in hadoop-env.sh,
> while for a MR job, in mapreduce.application.classpath,
> or for YARN, in yarn.application.classpath.
>
> 2. mapreduce.application.classpath
>
> > And updated the classpath for mapreduce applications:
>
> Note that it points to a distributed cache on the default HDP 2.2
> distribution.
>
>     <property>
>         <name>mapreduce.application.classpath</name>
>
> <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
>     </property>
> * $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
> hadoop-aws.jar(S3NFileSystem)
>
> While on a vanilla hadoop, it looks like standard paths as yours.
>
>     <property>
>         <name>mapreduce.application.classpath</name>
>
> <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
>     </property>
>
> Thanks,
> Sato
>
> On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cnauroth@hortonworks.com>
> wrote:
>
>>  Hello Billy,
>>
>>  I think your experience indicates that our documentation is
>> insufficient for discussing how to configure and use the alternative file
>> systems.  I filed issue HADOOP-11863 to track a documentation enhancement.
>>
>>  https://issues.apache.org/jira/browse/HADOOP-11863
>>
>>  Please feel free to watch that issue if you'd like to be informed as it
>> makes progress.  Thank you for reporting back to the thread after you had a
>> solution.
>>
>>   Chris Nauroth
>> Hortonworks
>> http://hortonworks.com/
>>
>>
>>   From: Billy Watson <williamrwatson@gmail.com>
>> Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>> Date: Monday, April 20, 2015 at 11:14 AM
>> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>>
>>   We found the correct configs.
>>
>>  This post was helpful, but didn't entirely work for us out of the box
>> since we are using hadoop-pseudo-distributed.
>> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>>
>>  We added a property to the core-site.xml file:
>>
>>    <property>
>>     <name>fs.s3n.impl</name>
>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>     <description>Tell hadoop which class to use to access s3 URLs. This
>> change became necessary in hadoop 2.6.0</description>
>>   </property>
>>
>>  And updated the classpath for mapreduce applications:
>>
>>    <property>
>>     <name>mapreduce.application.classpath</name>
>>
>> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>>     <description>The classpath specifically for mapreduce jobs. This
>> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>>   </property>
>>
>>   William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <williamrwatson@gmail.com>
>> wrote:
>>
>>> Thanks, anyways. Anyone else run into this issue?
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>>> cluster with Amazon met
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On 20 Apr 2015, at 16:53, Billy Watson <williamrwatson@gmail.com>
>>>> wrote:
>>>>
>>>>   This is an install on a CentOS 6 virtual machine used in our test
>>>> environment. We use HDP in staging and production and we discovered these
>>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>>> from Hadoop 2.4 to Hadoop 2.6.
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  One thing I think which i most likely missed completely is are you
>>>>> using an amazon EMR cluster or something in house?
>>>>>
>>>>>
>>>>>
>>>>> ---
>>>>> Regards,
>>>>> Jonathan Aquilina
>>>>> Founder Eagle Eye T
>>>>>
>>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>>
>>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped
being
>>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>>
>>>>> Essentially the problem boils down to:
>>>>>
>>>>> - need to access s3n URLs
>>>>> - cannot access without including the tools directory
>>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>>> happening later in job
>>>>> - need to find right env variable (or shell script or w/e) to include
>>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>>
>>>>>
>>>>>
>>>>>   William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>
>>>>>>  you mention an environmental variable. the step before you specify
>>>>>> the steps to run to get to the result. you can specify a bash script
that
>>>>>> will allow you to put any 3rd party jar files, for us we used esri,
on the
>>>>>> cluster and propagate them to all nodes in the cluster as well. You
can
>>>>>> ping me off list if you need further help. Thing is I havent used
pig but
>>>>>> my boss and coworker wrote the mappers and reducers. to get these
jars to
>>>>>> the entire cluster was a super small and simple bash script.
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Regards,
>>>>>> Jonathan Aquilina
>>>>>> Founder Eagle Eye T
>>>>>>
>>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the
>>>>>> command line without issue. I have set some options in hadoop-env.sh
to
>>>>>> make sure all the S3 stuff for hadoop 2.6 is set up correctly. (This
was
>>>>>> very confusing, BTW and not enough searchable documentation on changes
to
>>>>>> the s3 stuff in hadoop 2.6 IMHO).
>>>>>>
>>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%,
does
>>>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>>
>>>>>> I have added [hadoop-install-loc]/lib and
>>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH
env
>>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job
will fail
>>>>>> at 0% (before it ever gets to mapreduce) with a very similar "No
fileystem
>>>>>> for scheme s3n" error.
>>>>>>
>>>>>> I feel like at this point I just have to add the
>>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right environment
>>>>>> variable, but I can't figure out which environment variable that
should be.
>>>>>>
>>>>>> I appreciate any help, thanks!!
>>>>>>
>>>>>>
>>>>>> Stack trace:
>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at
>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>>> at
>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>>> at
>>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>>> at
>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>>> at
>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>>> at
>>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755)
at
>>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>>
>>>>>>
>>>>>> — Billy Watson
>>>>>>
>>>>>> --
>>>>>>  William Watson
>>>>>> Software Engineer
>>>>>> (904) 705-7056 PCS
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Mime
View raw message