hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-64) DataNode should be capable of managing multiple volumes
Date Mon, 07 Aug 2006 23:54:16 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-64?page=comments#action_12426370 ] 
            
Bryan Pendleton commented on HADOOP-64:
---------------------------------------

It sounds like its a different issue, but, I'm still worried about the "treat read-only as
dead" option.

I regularly see old drives throw out enough IDE errors to get remounted read-only. Often times,
a reboot will bring the drive back online just fine - and bring back the blocks that weren't
deleted from it. If that block number was reused in the meantime, we'll have a problem anyway,
regardless of whether we treat that block as readable once the initial read-only condition
is encountered. Meanwhile, we've possibly missed an opportunity to save a block from early
death, since a copy of it is still actually available.

I'll probably open up another issue around this later one, once the current one is closed,
just to clarify. In the meantime, is there a realistic issue with block numbers being reused,
such that an old block coming online would pervert things? Should a datanode maybe be periodically
requesting the CRCs for its blocks, and checking to see if they still match? This generally
falls into the space of "a good idea" anyway, since, barring reading a block, there's no way
to tell if its disk has gone bad.

> DataNode should be capable of managing multiple volumes
> -------------------------------------------------------
>
>                 Key: HADOOP-64
>                 URL: http://issues.apache.org/jira/browse/HADOOP-64
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.2.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>            Priority: Minor
>             Fix For: 0.6.0
>
>
> The dfs Datanode can only store data on a single filesystem volume. When a node runs
its disks JBOD this means running a Datanode per disk on the machine. While the scheme works
reasonably well on small clusters, on larger installations (several 100 nodes) it implies
a very large number of Datanodes with associated management overhead in the Namenode.
> The Datanod should be enhanced to be able to handle multiple volumes on a single machine.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message