hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billy Watson <williamrwat...@gmail.com>
Subject Re: Unable to Find S3N Filesystem Hadoop 2.6
Date Mon, 20 Apr 2015 18:14:25 GMT
We found the correct configs.

This post was helpful, but didn't entirely work for us out of the box since
we are using hadoop-pseudo-distributed.
http://hortonworks.com/community/forums/topic/s3n-error-for-hdp-2-2/

We added a property to the core-site.xml file:

  <property>
    <name>fs.s3n.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
    <description>Tell hadoop which class to use to access s3 URLs. This
change became necessary in hadoop 2.6.0</description>
  </property>

And updated the classpath for mapreduce applications:

  <property>
    <name>mapreduce.application.classpath</name>

<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*:$HADOOP_MAPRED_HOME/share/hadoop/tools/lib/*</value>
    <description>The classpath specifically for mapreduce jobs. This
override is nec. so that s3n URLs work on hadoop 2.6.0+</description>
  </property>

William Watson
Software Engineer
(904) 705-7056 PCS

On Mon, Apr 20, 2015 at 11:13 AM, Billy Watson <williamrwatson@gmail.com>
wrote:

> Thanks, anyways. Anyone else run into this issue?
>
> William Watson
> Software Engineer
> (904) 705-7056 PCS
>
> On Mon, Apr 20, 2015 at 11:11 AM, Jonathan Aquilina <
> jaquilina@eagleeyet.net> wrote:
>
>> Sadly I'll have to pull back I have only run a Hadoop map reduce cluster
>> with Amazon met
>>
>> Sent from my iPhone
>>
>> On 20 Apr 2015, at 16:53, Billy Watson <williamrwatson@gmail.com> wrote:
>>
>> This is an install on a CentOS 6 virtual machine used in our test
>> environment. We use HDP in staging and production and we discovered these
>> issues while trying to build a new cluster using HDP 2.2 which upgrades
>> from Hadoop 2.4 to Hadoop 2.6.
>>
>> William Watson
>> Software Engineer
>> (904) 705-7056 PCS
>>
>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <
>> jaquilina@eagleeyet.net> wrote:
>>
>>>  One thing I think which i most likely missed completely is are you
>>> using an amazon EMR cluster or something in house?
>>>
>>>
>>>
>>> ---
>>> Regards,
>>> Jonathan Aquilina
>>> Founder Eagle Eye T
>>>
>>>  On 2015-04-20 16:21, Billy Watson wrote:
>>>
>>> I appreciate the response. These JAR files aren't 3rd party. They're
>>> included with the Hadoop distribution, but in Hadoop 2.6 they stopped being
>>> loaded by default and now they have to be loaded manually, if needed.
>>>
>>> Essentially the problem boils down to:
>>>
>>> - need to access s3n URLs
>>> - cannot access without including the tools directory
>>> - after including tools directory in HADOOP_CLASSPATH, failures start
>>> happening later in job
>>> - need to find right env variable (or shell script or w/e) to include
>>> jets3t & other JARs needed to access s3n URLs (I think)
>>>
>>>
>>>
>>>   William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>>
>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <
>>> jaquilina@eagleeyet.net> wrote:
>>>
>>>>  you mention an environmental variable. the step before you specify
>>>> the steps to run to get to the result. you can specify a bash script that
>>>> will allow you to put any 3rd party jar files, for us we used esri, on the
>>>> cluster and propagate them to all nodes in the cluster as well. You can
>>>> ping me off list if you need further help. Thing is I havent used pig but
>>>> my boss and coworker wrote the mappers and reducers. to get these jars to
>>>> the entire cluster was a super small and simple bash script.
>>>>
>>>>
>>>>
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>>
>>>>   On 2015-04-20 15:17, Billy Watson wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command
>>>> line without issue. I have set some options in hadoop-env.sh to make sure
>>>> all the S3 stuff for hadoop 2.6 is set up correctly. (This was very
>>>> confusing, BTW and not enough searchable documentation on changes to the
s3
>>>> stuff in hadoop 2.6 IMHO).
>>>>
>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does
>>>> not fail in pig, but rather fails in mapreduce with "Error:
>>>> java.io.IOException: No FileSystem for scheme: s3n."
>>>>
>>>> I have added [hadoop-install-loc]/lib and
>>>> [hadoop-install-loc]/share/hadoop/tools/lib/ to the HADOOP_CLASSPATH env
>>>> variable in hadoop-env.sh.erb. When I do not do this, the pig job will fail
>>>> at 0% (before it ever gets to mapreduce) with a very similar "No fileystem
>>>> for scheme s3n" error.
>>>>
>>>> I feel like at this point I just have to add the share/hadoop/tools/lib
>>>> directory (and maybe lib) to the right environment variable, but I can't
>>>> figure out which environment variable that should be.
>>>>
>>>> I appreciate any help, thanks!!
>>>>
>>>>
>>>> Stack trace:
>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
>>>> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
>>>> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91) at
>>>> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at
>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612) at
>>>> org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at
>>>> org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at
>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
>>>> at
>>>> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
>>>> at
>>>> org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
>>>> at
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
>>>> at
>>>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at
>>>> org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at
>>>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at
>>>> java.security.AccessController.doPrivileged(Native Method) at
>>>> javax.security.auth.Subject.doAs(Subject.java:415) at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>>> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>>
>>>>
>>>> — Billy Watson
>>>>
>>>> --
>>>>  William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
>>>>
>>>>
>>
>

Mime
View raw message