hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: Running YARN on top of legacy HDFS (i.e. 0.20)
Date Thu, 08 Dec 2011 21:30:49 GMT
I was able to convert FileContext to FileSystem and related methods 
fairly straightforwardly, but am running into issues of dealing with 
security incompatibilites (i.e. UserGroupInformation, etc.).  Yuck.

Avery

On 12/6/11 3:50 PM, Arun C Murthy wrote:
> Avery,
>
> If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't
very much effort.
>
> thanks,
> Arun
>
> On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:
>
>> I think it would be nice if YARN could work on existing older HDFS instances, a lot
of folks will be slow to upgrade HDFS with all their important data on it.  I could also go
that route I guess.
>>
>> Avery
>>
>> On 12/6/11 8:51 AM, Arun C Murthy wrote:
>>> Avery,
>>>
>>>   They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka
FileContext apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
>>>
>>>   We have used the new HDFS apis in YARN in some places.
>>>
>>> hth,
>>> Arun
>>>
>>> On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
>>>
>>>> Thank you for the response, that's what I thought as well =).  I spent the
day trying to port the required 0.23 APIs to 0.20 HDFS.  There have been a lot of API changes!
>>>>
>>>> Avery
>>>>
>>>> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>>>>> Avery,
>>>>>   Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>>>>> wrong but looking at the HDFS apis' it doesnt look like that it would
>>>>> be a lot of work to getting it to work with 0.20 apis. We had been
>>>>> using filecontext api's initially but have transitioned back to the
>>>>> old API's.
>>>>>
>>>>> Hope that helps.
>>>>>
>>>>> mahadev
>>>>>
>>>>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<aching@apache.org>
   wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've been playing with 0.23.0, really nice stuff!  I was able to
setup a
>>>>>> small test cluster (40 nodes) and launch the example jobs.  I was
also able
>>>>>> to recompile old Hadoop programs with the new jars and start up those
>>>>>> programs as well.  My question is the following:
>>>>>>
>>>>>> We have an HDFS instance based on 0.20 that I would like to hook
up to YARN.
>>>>>>   This appears to be a bit of work.  Launching the jobs gives me
the
>>>>>> following error:
>>>>>>
>>>>>> 2011-12-05 15:48:05,023 INFO  ipc.YarnRPC (YarnRPC.java:create(47))
-
>>>>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>>>>> 2011-12-05 15:48:05,040 INFO  mapred.ResourceMgrDelegate
>>>>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager
at
>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>> 2011-12-05 15:48:05,041 INFO  ipc.HadoopYarnRPC
>>>>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc
proxy
>>>>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>>>>> 2011-12-05 15:48:05,121 INFO  mapred.ResourceMgrDelegate
>>>>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager
at
>>>>>> {removed}.{xxx}/{removed}:50177
>>>>>> 2011-12-05 15:48:05,133 INFO  mapreduce.Cluster
>>>>>> (Cluster.java:initialize(116)) - Failed to use
>>>>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>>>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>>>>> Exception in thread "main" java.io.IOException: Cannot initialize
Cluster.
>>>>>> Please check your configuration for mapreduce.framework.name and
the
>>>>>> correspond server addresses.
>>>>>>     at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>>>>>     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>>>>>     at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>>>>>     at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>>>>>     at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>     at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>>     at
>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>>>>>     at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>>>>>     at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>>>>>     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>>>>>     at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>>>>>     at
>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>>>>>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>>>>>     at
>>>>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>     at
>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>     at
>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>     at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>>>>>
>>>>>> After doing a little digging it appears that YarnClientProtocolProvider
>>>>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class
that is
>>>>>> not available available in older versions of HDFS.
>>>>>>
>>>>>> What versions of HDFS are currently supported and what HDFS versions
are
>>>>>> planned for support?  It would be great to be able to run YARN on
legacy
>>>>>> HDFS installations.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Avery


Mime
View raw message