hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billy Watson <williamrwat...@gmail.com>
Subject Re: Unable to Find S3N Filesystem Hadoop 2.6
Date Wed, 22 Apr 2015 13:05:46 GMT
Sato,

Also, we did see a different error entirely when we didn't set the
fs.s3n.impl, but I can try removing that property in development now that
we have it working to verify.

But the "it has never been done in previous versions" is irrelevant, IMO.
This was a big change and that certainly could have changed, but if you're
looking at the code then I'm likely wrong.

William Watson
Software Engineer
(904) 705-7056 PCS

On Wed, Apr 22, 2015 at 9:02 AM, Billy Watson <williamrwatson@gmail.com>
wrote:

> Chris and Sato,
>
> Thanks a bunch! I've been so swamped by these and other issues we've been
> having in scrambling to upgrade our cluster that I forgot to file a bug. I
> certainly complained aloud that the docs were insufficient, but I didn't do
> anything to help the community so thanks a bunch for recognizing that and
> helping me out!
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Wed, Apr 22, 2015 at 3:06 AM, Takenori Sato <tsato@cloudian.com> wrote:
>
>> Hi Billy, Chris,
>>
>> Let me share a couple of my findings.
>>
>> I believe this was introduced by HADOOP-10893,
>> which was introduced from 2.6.0(HDP2.2).
>>
>> 1. fs.s3n.impl
>>
>> > We added a property to the core-site.xml file:
>>
>> You don't need to explicitly set this. It has never been done so in
>> previous versions.
>>
>> Take a look at FileSystem#loadFileSystem, which is called from
>> FileSystem#getFileSystemClass.
>> Subclasses of FileSystem are loaded automatically if they are available
>> on a classloader you care.
>>
>> So you just need to make sure hadoop-aws.jar is on a classpath.
>>
>> For file system shell, this is done in hadoop-env.sh,
>> while for a MR job, in mapreduce.application.classpath,
>> or for YARN, in yarn.application.classpath.
>>
>> 2. mapreduce.application.classpath
>>
>> > And updated the classpath for mapreduce applications:
>>
>> Note that it points to a distributed cache on the default HDP 2.2
>> distribution.
>>
>>     <property>
>>         <name>mapreduce.application.classpath</name>
>>
>> <value>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure</value>
>>     </property>
>> * $PWD/mr-framework/hadoop/share/hadoop/tools/lib/* contains
>> hadoop-aws.jar(S3NFileSystem)
>>
>> While on a vanilla hadoop, it looks like standard paths as yours.
>>
>>     <property>
>>         <name>mapreduce.application.classpath</name>
>>
>> <value>/hadoop-2.6.0/etc/hadoop:/hadoop-2.6.0/share/hadoop/common/lib/*:/hadoop-2.6.0/share/hadoop/common/*:/hadoop-2.6.0/share/hadoop/hdfs:/hadoop-2.6.0/share/hadoop/hdfs/lib/*:/hadoop-2.6.0/share/hadoop/hdfs/*:/hadoop-2.6.0/share/hadoop/yarn/lib/*:/hadoop-2.6.0/share/hadoop/yarn/*:/hadoop-2.6.0/share/hadoop/mapreduce/lib/*:/hadoop-2.6.0/share/hadoop/mapreduce/*:/hadoop-2.6.0/contrib/capacity-scheduler/*.jar:/hadoop-2.6.0/share/hadoop/tools/lib/*</value>
>>     </property>
>>
>> Thanks,
>> Sato
>>
>> On Wed, Apr 22, 2015 at 3:10 PM, Chris Nauroth <cnauroth@hortonworks.com>
>> wrote:
>>
>>>  Hello Billy,
>>>
>>>  I think your experience indicates that our documentation is
>>> insufficient for discussing how to configure and use the alternative file
>>> systems.  I filed issue HADOOP-11863 to track a documentation enhancement.
>>>
>>>  https://issues.apache.org/jira/browse/HADOOP-11863
>>>
>>>  Please feel free to watch that issue if you'd like to be informed as
>>> it makes progress.  Thank you for reporting back to the thread after you
>>> had a solution.
>>>
>>>   Chris Nauroth
>>> Hortonworks
>>> http://hortonworks.com/
>>>
>>>
>>>   From: Billy Watson <williamrwatson@gmail.com>
>>> Reply-To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>>> Date: Monday, April 20, 2015 at 11:14 AM
>>> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
>>> Subject: Re: Unable to Find S3N Filesystem Hadoop 2.6
>>>
>>>   We found the correct configs.
>>>
>>>  This post was helpful, but didn't entirely work for us out of the box
>>> since we are using hadoop-pseudo-distributed.
>>> http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/
>>>
>>>  We added a property to the core-site.xml file:
>>>
>>>    <property>
>>>     <name>fs.s3n.impl</name>
>>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>     <description>Tell hadoop which class to use to access s3 URLs. This
>>> change became necessary in hadoop 2.6.0</description>
>>>   </property>
>>>
>>>  And updated the classpath for mapreduce applications:
>>>
>>>    <property>
>>>     <name>mapreduce.application.classpath</name>
>>>
>>> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
>>>     <description>The classpath specifically for mapreduce jobs. This
>>> override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
>>>   </property>
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <williamrwatson@gmail.com
>>> > wrote:
>>>
>>>> Thanks, anyways. Anyone else run into this issue?
>>>>
>>>>   William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>>   On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
>>>> jaquilina@eagleeyet.net> wrote:
>>>>
>>>>>  Sadly I'll have to pull back I have only run a Hadoop map reduce
>>>>> cluster with Amazon met
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On 20 Apr 2015, at 16:53, Billy Watson <williamrwatson@gmail.com>
>>>>> wrote:
>>>>>
>>>>>   This is an install on a CentOS 6 virtual machine used in our test
>>>>> environment. We use HDP in staging and production and we discovered these
>>>>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>>>>> from Hadoop 2.4 to Hadoop 2.6.
>>>>>
>>>>>   William Watson
>>>>> Software Engineer
>>>>> (904) 705-7056 PCS
>>>>>
>>>>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>
>>>>>>  One thing I think which i most likely missed completely is are you
>>>>>> using an amazon EMR cluster or something in house?
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---
>>>>>> Regards,
>>>>>> Jonathan Aquilina
>>>>>> Founder Eagle Eye T
>>>>>>
>>>>>>   On 2015-04-20 16:21, Billy Watson wrote:
>>>>>>
>>>>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>>>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped
being
>>>>>> loaded by default and now they have to be loaded manually, if needed.
>>>>>>
>>>>>> Essentially the problem boils down to:
>>>>>>
>>>>>> - need to access s3n URLs
>>>>>> - cannot access without including the tools directory
>>>>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>>>>> happening later in job
>>>>>> - need to find right env variable (or shell script or w/e) to include
>>>>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>>>>
>>>>>>
>>>>>>
>>>>>>   William Watson
>>>>>> Software Engineer
>>>>>> (904) 705-7056 PCS
>>>>>>
>>>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>>>>> jaquilina@eagleeyet.net> wrote:
>>>>>>
>>>>>>>  you mention an environmental variable. the step before you specify
>>>>>>> the steps to run to get to the result. you can specify a bash
script that
>>>>>>> will allow you to put any 3rd party jar files, for us we used
esri, on the
>>>>>>> cluster and propagate them to all nodes in the cluster as well.
You can
>>>>>>> ping me off list if you need further help. Thing is I havent
used pig but
>>>>>>> my boss and coworker wrote the mappers and reducers. to get these
jars to
>>>>>>> the entire cluster was a super small and simple bash script.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>> Regards,
>>>>>>> Jonathan Aquilina
>>>>>>> Founder Eagle Eye T
>>>>>>>
>>>>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the
>>>>>>> command line without issue. I have set some options in hadoop-env.sh
to
>>>>>>> make sure all the S3 stuff for hadoop 2.6 is set up correctly.
(This was
>>>>>>> very confusing, BTW and not enough searchable documentation on
changes to
>>>>>>> the s3 stuff in hadoop 2.6 IMHO).
>>>>>>>
>>>>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%,
>>>>>>> does not fail in pig, but rather fails in mapreduce with "Error:
>>>>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>>>>
>>>>>>> I have added [hadoop-install-loc]/lib and
>>>>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH
env
>>>>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig
job will fail
>>>>>>> at 0% (before it ever gets to mapreduce) with a very similar
"No fileystem
>>>>>>> for scheme s3n" error.
>>>>>>>
>>>>>>> I feel like at this point I just have to add the
>>>>>>> share/hadoop/tools/lib directory (and maybe lib) to the right
environment
>>>>>>> variable, but I can't figure out which environment variable that
should be.
>>>>>>>
>>>>>>> I appreciate any help, thanks!!
>>>>>>>
>>>>>>>
>>>>>>> Stack trace:
>>>>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at
>>>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at
>>>>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>>>>> at
>>>>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>>>>> at
>>>>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>>>>> at
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>>>>> at
>>>>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>>>>> at
>>>>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755)
at
>>>>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at
>>>>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>>>>
>>>>>>>
>>>>>>> — Billy Watson
>>>>>>>
>>>>>>> --
>>>>>>>  William Watson
>>>>>>> Software Engineer
>>>>>>> (904) 705-7056 PCS
>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message