hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianluigi Zanetti <gianluigi.zane...@crs4.it>
Subject Re: sun grid engine and map reduce
Date Fri, 11 Dec 2009 07:58:17 GMT
Hello Himanshu.
Could you please describe in more detail your use case?

There are two basic Gridengine integration schemes:

1/ Native integration in grid engine
This is the one referred to in Dan Templeton's blog. It is based on the
assumption that hdfs is always running, and it will essentially have
your map-reduce job (including a per-job jobtracker) scheduled as a
parallel environment 'as-close-as-possible' to your hdfs data. 
It will be in GE 6.2u5, which is currently in beta and should be out any
moment now. It is possible to back-port to 6.2u4 and probably to 6.2u3.

2/ HOD integration
What hod does, in a nutshell, is to allocate a group of machines as a
parallel environment within GE and to run a jobtracker and a namenode
that will control the allocated machines. User will then submit their
jobs to the jobtracker and use the hdfs controlled by the namenode.
Of course, the resulting hadoop environment is transient since, as far
as GE is concerned, it is simply a parallel job. Of course, the meaning
of transient depends on how you set-up your queues.
We have developed a patch to add Gridengine support to hadoop hod,
http://issues.apache.org/jira/browse/HADOOP-6369
This is pretty undemanding on GE version, but it is not very efficient
hdfs wise, since gridengine is ignorant of hdfs data locality. In
practice, either you ask hod to use an independent hdfs that is always
up -- but there is no guarantee that the tasktracker nodes will be close
to the data -- or you upload your data to a new hdfs that will be
created by hod.

Thus, 1/ is definitely more efficient and 'cluster-wide' while 2/ is
more like a sort of cluster partitioning.



--gianluigi







On Wed, 2009-12-09 at 12:43 -0800, himanshu chandola wrote:
> Hi all,
> We are integrating the hadoop jobs with the sun grid engine. Most of
> the map reduce jobs that start on our cluster are sequential map and
> reduce. I also found integration guidelines
> here :http://blogs.sun.com/templedf/entry/beta_testing_the_sun_grid
> and http://blogs.sun.com/ravee/entry/creating_hadoop_pe_under_sge .
> 
> I wanted to know whether every sequential map-reduce job would be counted as a separate
job to sun sge. That's necessary because in total the sequential map-reduce runs for days.
> 
> Thanks
> H
> 
>  Morpheus: Do you believe in fate, Neo?
> Neo: No.
> Morpheus: Why Not?
> Neo: Because I don't like the idea that I'm not in control of my life.
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 

Mime
View raw message