hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Aquilina <jaquil...@eagleeyet.net>
Subject Re: Unable to Find S3N Filesystem Hadoop 2.6
Date Mon, 20 Apr 2015 15:11:44 GMT
Sadly I'll have to pull back I have only run a Hadoop map reduce cluster with Amazon met

Sent from my iPhone

> On 20 Apr 2015, at 16:53, Billy Watson <williamrwatson@gmail.com> wrote:
> 
> This is an install on a CentOS 6 virtual machine used in our test environment. We use
HDP in staging and production and we discovered these issues while trying to build a new cluster
using HDP 2.2 which upgrades from Hadoop 2.4 to Hadoop 2.6. 
> 
> William Watson
> Software Engineer
> (904) 705-7056 PCS
> 
>> On Mon, Apr 20, 2015 at 10:26 AM, Jonathan Aquilina <jaquilina@eagleeyet.net>
wrote:
>> One thing I think which i most likely missed completely is are you using an amazon
EMR cluster or something in house?
>> 
>>  
>> 
>> ---
>> Regards,
>> Jonathan Aquilina
>> Founder Eagle Eye T
>>> On 2015-04-20 16:21, Billy Watson wrote:
>>> 
>>> I appreciate the response. These JAR files aren't 3rd party. They're included
with the Hadoop distribution, but in Hadoop 2.6 they stopped being loaded by default and now
they have to be loaded manually, if needed. 
>>>  
>>> Essentially the problem boils down to:
>>>  
>>> - need to access s3n URLs
>>> - cannot access without including the tools directory
>>> - after including tools directory in HADOOP_CLASSPATH, failures start happening
later in job
>>> - need to find right env variable (or shell script or w/e) to include jets3t
& other JARs needed to access s3n URLs (I think)
>>>  
>>>  
>>> 
>>> William Watson
>>> Software Engineer
>>> (904) 705-7056 PCS
>>> 
>>>> On Mon, Apr 20, 2015 at 9:58 AM, Jonathan Aquilina <jaquilina@eagleeyet.net>
wrote:
>>>> you mention an environmental variable. the step before you specify the steps
to run to get to the result. you can specify a bash script that will allow you to put any
3rd party jar files, for us we used esri, on the cluster and propagate them to all nodes in
the cluster as well. You can ping me off list if you need further help. Thing is I havent
used pig but my boss and coworker wrote the mappers and reducers. to get these jars to the
entire cluster was a super small and simple bash script.
>>>> 
>>>>  
>>>> 
>>>> ---
>>>> Regards,
>>>> Jonathan Aquilina
>>>> Founder Eagle Eye T
>>>> On 2015-04-20 15:17, Billy Watson wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I am able to run a `hadoop fs -ls s3n://my-s3-bucket` from the command line
without issue. I have set some options in hadoop-env.sh to make sure all the S3 stuff for
hadoop 2.6 is set up correctly. (This was very confusing, BTW and not enough searchable documentation
on changes to the s3 stuff in hadoop 2.6 IMHO).
>>>> 
>>>> Anyways, when I run a pig job which accesses s3, it gets to 16%, does not
fail in pig, but rather fails in mapreduce with "Error: java.io.IOException: No FileSystem
for scheme: s3n." 
>>>> 
>>>> I have added [hadoop-install-loc]/lib and [hadoop-install-loc]/share/hadoop/tools/lib/
to the HADOOP_CLASSPATH env variable in hadoop-env.sh.erb. When I do not do this, the pig
job will fail at 0% (before it ever gets to mapreduce) with a very similar "No fileystem for
scheme s3n" error.
>>>> 
>>>> I feel like at this point I just have to add the share/hadoop/tools/lib directory
(and maybe lib) to the right environment variable, but I can't figure out which environment
variable that should be.
>>>> 
>>>> I appreciate any help, thanks!!
>>>> 
>>>> 
>>>> Stack trace:
>>>> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:498)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.setInputPaths(FileInputFormat.java:467)
at org.apache.pig.piggybank.storage.CSVExcelStorage.setLocation(CSVExcelStorage.java:609)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:129)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:103)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:512)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:755) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native
Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>>> 
>>>> 
>>>> — Billy Watson
>>>> 
>>>> --
>>>> 
>>>> William Watson
>>>> Software Engineer
>>>> (904) 705-7056 PCS
> 

Mime
View raw message