hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: dfs.data.dir
Date Mon, 26 Apr 2010 16:45:44 GMT
Allen Wittenauer wrote:
> On Apr 22, 2010, at 5:41 AM, Steve Loughran wrote:
>> that brings up a couple of issues I've been thinking about now that workers can go
to 6+ HDDs/node
>>
>> * a way to measure the distribution across disks, rather than just nodes. DfsClient
doesn't provide enough info here yet.
> 
> What should probably happen is that instead of throwing you to the file browser, clicking
on a host from the live nodes page should probably put you on a "stats about this node" page.

I don't want to do any of this by hand. I want machine readable content 
something can aggregate over time.

> 
>> * a way to triger some rebalancing on a single node, to say "position stuff more
fairly". You don't need to worry about network traffic, just local disk load and CPU time,
so it should be simpler.
> 
> 
> Yup.  Working with 8 drives per node, it is interesting to see how unbalanced the data
gets after a while.  [Luckily, we have MR tmp space segregated off so I'm sure it would be
a lot worse if we didn't!]
> 
> Someone should file a jira. :)

Especially if someone else offers to fix it.


Mime
View raw message