hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: Running YARN on top of legacy HDFS (i.e. 0.20)
Date Tue, 06 Dec 2011 23:50:59 GMT
Avery,

If you could take a look at what it would take, I'd be grateful. I'm hoping it isn't very
much effort.

thanks,
Arun

On Dec 6, 2011, at 10:05 AM, Avery Ching wrote:

> I think it would be nice if YARN could work on existing older HDFS instances, a lot of
folks will be slow to upgrade HDFS with all their important data on it.  I could also go that
route I guess.
> 
> Avery
> 
> On 12/6/11 8:51 AM, Arun C Murthy wrote:
>> Avery,
>> 
>>  They aren't 'api changes'. HDFS just has a new set of apis in hadoop-0.23 (aka FileContext
apis). Both the old (FileSystem apis) and new are supported in hadoop-0.23.
>> 
>>  We have used the new HDFS apis in YARN in some places.
>> 
>> hth,
>> Arun
>> 
>> On Dec 5, 2011, at 10:59 PM, Avery Ching wrote:
>> 
>>> Thank you for the response, that's what I thought as well =).  I spent the day
trying to port the required 0.23 APIs to 0.20 HDFS.  There have been a lot of API changes!
>>> 
>>> Avery
>>> 
>>> On 12/5/11 9:14 PM, Mahadev Konar wrote:
>>>> Avery,
>>>>  Currently we have only tested 0.23 MRv2 with 0.23 hdfs. I might be
>>>> wrong but looking at the HDFS apis' it doesnt look like that it would
>>>> be a lot of work to getting it to work with 0.20 apis. We had been
>>>> using filecontext api's initially but have transitioned back to the
>>>> old API's.
>>>> 
>>>> Hope that helps.
>>>> 
>>>> mahadev
>>>> 
>>>> On Mon, Dec 5, 2011 at 4:01 PM, Avery Ching<aching@apache.org>   wrote:
>>>>> Hi,
>>>>> 
>>>>> I've been playing with 0.23.0, really nice stuff!  I was able to setup
a
>>>>> small test cluster (40 nodes) and launch the example jobs.  I was also
able
>>>>> to recompile old Hadoop programs with the new jars and start up those
>>>>> programs as well.  My question is the following:
>>>>> 
>>>>> We have an HDFS instance based on 0.20 that I would like to hook up to
YARN.
>>>>>  This appears to be a bit of work.  Launching the jobs gives me the
>>>>> following error:
>>>>> 
>>>>> 2011-12-05 15:48:05,023 INFO  ipc.YarnRPC (YarnRPC.java:create(47)) -
>>>>> Creating YarnRPC for org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
>>>>> 2011-12-05 15:48:05,040 INFO  mapred.ResourceMgrDelegate
>>>>> (ResourceMgrDelegate.java:<init>(95)) - Connecting to ResourceManager
at
>>>>> {removed}.{xxx}/{removed}:50177
>>>>> 2011-12-05 15:48:05,041 INFO  ipc.HadoopYarnRPC
>>>>> (HadoopYarnProtoRPC.java:getProxy(48)) - Creating a HadoopYarnProtoRpc
proxy
>>>>> for protocol interface org.apache.hadoop.yarn.api.ClientRMProtocol
>>>>> 2011-12-05 15:48:05,121 INFO  mapred.ResourceMgrDelegate
>>>>> (ResourceMgrDelegate.java:<init>(99)) - Connected to ResourceManager
at
>>>>> {removed}.{xxx}/{removed}:50177
>>>>> 2011-12-05 15:48:05,133 INFO  mapreduce.Cluster
>>>>> (Cluster.java:initialize(116)) - Failed to use
>>>>> org.apache.hadoop.mapred.YarnClientProtocolProvider due to error:
>>>>> java.lang.ClassNotFoundException: org.apache.hadoop.fs.Hdfs
>>>>> Exception in thread "main" java.io.IOException: Cannot initialize Cluster.
>>>>> Please check your configuration for mapreduce.framework.name and the
>>>>> correspond server addresses.
>>>>>    at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:123)
>>>>>    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:85)
>>>>>    at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:78)
>>>>>    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1129)
>>>>>    at org.apache.hadoop.mapreduce.Job$1.run(Job.java:1125)
>>>>>    at java.security.AccessController.doPrivileged(Native Method)
>>>>>    at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>>    at
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1152)
>>>>>    at org.apache.hadoop.mapreduce.Job.connect(Job.java:1124)
>>>>>    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1153)
>>>>>    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1176)
>>>>>    at org.apache.giraph.graph.GiraphJob.run(GiraphJob.java:560)
>>>>>    at
>>>>> org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:193)
>>>>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>>>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:83)
>>>>>    at
>>>>> org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:201)
>>>>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>    at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>    at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>    at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
>>>>> 
>>>>> After doing a little digging it appears that YarnClientProtocolProvider
>>>>> creates a YARNRunner that uses org.apache.hadoop.fs.Hdfs, a class that
is
>>>>> not available available in older versions of HDFS.
>>>>> 
>>>>> What versions of HDFS are currently supported and what HDFS versions
are
>>>>> planned for support?  It would be great to be able to run YARN on legacy
>>>>> HDFS installations.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Avery
> 


Mime
View raw message