hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: dfs.data.dir
Date Thu, 22 Apr 2010 19:59:18 GMT

On Apr 22, 2010, at 5:41 AM, Steve Loughran wrote:
> that brings up a couple of issues I've been thinking about now that workers can go to
6+ HDDs/node
> 
> * a way to measure the distribution across disks, rather than just nodes. DfsClient doesn't
provide enough info here yet.

What should probably happen is that instead of throwing you to the file browser, clicking
on a host from the live nodes page should probably put you on a "stats about this node" page.

> * a way to triger some rebalancing on a single node, to say "position stuff more fairly".
You don't need to worry about network traffic, just local disk load and CPU time, so it should
be simpler.


Yup.  Working with 8 drives per node, it is interesting to see how unbalanced the data gets
after a while.  [Luckily, we have MR tmp space segregated off so I'm sure it would be a lot
worse if we didn't!]

Someone should file a jira. :)


Mime
View raw message