hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chad Walters <c...@powerset.com>
Subject Re: Hadoop research
Date Mon, 25 Feb 2008 01:37:26 GMT
+1 to Jeff's suggestions, especially on locality. I'd love to see some rigorous work done so
that the scheduler could prefer distributing tasks to the nodes that are already hosting the
appropriate data. Generalizing this further so that a full vertical integration of HDFS, Hbase,
and Map/Reduce could exploit maximal data locality would be even cooler.


On 2/24/08 2:56 PM, "Jeff Hammerbacher" <jeff.hammerbacher@gmail.com> wrote:

Hey Jaideep,

One interesting direction for research would be more sophisticated
scheduling policies for the JobTracker to help improve locality and overall
cluster utilization.  The introduction of speculative execution is a step in
this direction; you could perhaps investigate the implications of different
speculative execution policies on different job types.


On Sun, Feb 24, 2008 at 9:41 AM, Jaideep Dhok <jaideep.dhok@gmail.com>

> Hello,
> I am a graduate research student  in CS at the Search and Information
> Extraction Lab, in IIIT Hyderabad, India (http://search.iiit.ac.in). I
> have
> been working on Nutch and Hadoop for the past couple of months, basically
> to
> get an understanding of the platform, and to discover possible research
> areas for my thesis work. Most of the time I have been playing with the
> Hadoop code base, and by now I am pretty much familiar with the internals
> (especially the Map-Reduce part).
> I have been reading publications related to Map-Reduce and the Google file
> system etc, and I am still looking for interesing research topics. I was
> wondering if anyone would like to share/suggest any ideas related to the
> Hadoop plaform. Any suggestions and comments are greatly appreciated.
> Thanks and Regards,
> Jaideep Dhok,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message