hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daming Wang <Daming.W...@autodesk.com>
Subject RE: Hadoop research
Date Tue, 26 Feb 2008 02:47:42 GMT
To my understanding, the decentralized strategy not only improve fault tolerant, also can improve
performance. Maybe need more sophisticated way to control consistency, but obviously, decentralized
architecture has many advantages than a centralized control. I suggest you can see the two
papers at first (1) amazon-dynamo-sosp2007 (2)Beehive, you can get them from internet.

Of course, the decentralized system will cause complex for the system. I just what to point
that what kind of research you want to do based hadoop? If some small improvement for current
module, the schedule policies is ok, but if you want to research for relative big improvement
for the whole architecture, how to adopt decentralized strategy maybe a direction and help
you to publish papers. :)



-----Original Message-----
From: Jaideep Dhok [mailto:jaideep.dhok@gmail.com]
Sent: Monday, February 25, 2008 8:53 PM
To: core-dev@hadoop.apache.org
Subject: Re: Hadoop research

Hi,
First of all thank you for your responses.

"One interesting direction for research would be more sophisticated
scheduling policies for the JobTracker to help improve locality and overall
cluster utilization."
This is a very interesting area. In fact I was trying a simple Round Robin
scheduler, but I didn't take data location into account.

On Mon, Feb 25, 2008 at 8:32 AM, Daming Wang <Daming.Wang@autodesk.com>
wrote:

> How about combine the decentralized strategy to improve HDFS? Something
> like o(N) DHT architecture used by the Amazon s3
> Of course, using decentralized method to change hadoop will cause huge
> work, but it is a good direction if as a research topic I think...

By a decentralized strategy do you mean a peer to peer system? Although that
would be very fault tolerant, wouldn't there be consistency and performance
issues?
If I understand correctly, the rationale behind current centralized
architecture is that it keeps the system simple. Would it be useful to study
how much decentralization is possible without adversely affecting
performance?


Again, thanks a lot for your comments.

Regards,
Jaideep

Mime
View raw message