hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <sc...@richrelevance.com>
Subject Re: dfs.data.dir
Date Wed, 28 Apr 2010 17:31:23 GMT

On Apr 26, 2010, at 9:45 AM, Steve Loughran wrote:

> Allen Wittenauer wrote:
>> On Apr 22, 2010, at 5:41 AM, Steve Loughran wrote:
>>> that brings up a couple of issues I've been thinking about now that workers can
go to 6+ HDDs/node
>>> * a way to measure the distribution across disks, rather than just nodes. DfsClient
doesn't provide enough info here yet.
>> What should probably happen is that instead of throwing you to the file browser,
clicking on a host from the live nodes page should probably put you on a "stats about this
node" page.
> I don't want to do any of this by hand. I want machine readable content 
> something can aggregate over time.
>>> * a way to triger some rebalancing on a single node, to say "position stuff more
fairly". You don't need to worry about network traffic, just local disk load and CPU time,
so it should be simpler.
>> Yup.  Working with 8 drives per node, it is interesting to see how unbalanced the
data gets after a while.  [Luckily, we have MR tmp space segregated off so I'm sure it would
be a lot worse if we didn't!]
>> Someone should file a jira. :)
> Especially if someone else offers to fix it.

Should be trivial to at least make the new block allocation choose which device to allocate
the block on with a weighted roulette algorithm instead of round-robin.

View raw message