hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Boucher <vin.bouc...@gmail.com>
Subject Re: hdfs file - datanode association
Date Thu, 06 Oct 2011 15:44:47 GMT
Hi John, Hi Will,

Thank you for your answers.

John,
>[...]
>If you have two different NNs, why would you need separate libs etc? You
>could point to the respective namenode while accessing that namenode.

From the point of view of the end-users, it should be transparent when they 
want to write from their code to the storage volumes to not bother with 
different configuration schemes.


Will,
>Why do you want to partition your datanodes this way? In our cluster, 
datanode size ranges from ~1 TB (no RAID) up to ~80TB (RAID). While in an
>ideal world all datanodes would be similar in configuration, we have not
>observed any issues with this arrangement in production.

It's much more likely that a working node goes down or has its hosted data 
corrupted than a mass storage (with raid, redundant power sources & ethernet 
bonding). 
The data hosted on the working nodes, hadoop path: /hdfs/wn, have then to be 
replicated (replication factor = 2), while replication is not necessary for 
the data stored on the mass storage servers.




Le Thursday 06 October 2011, Will Maier a écrit :
> Hi Vincent-
>
> On Thu, Oct 06, 2011 at 11:19:20AM +0200, Vincent Boucher wrote:
> > We are wondering if it is possible to require that the namenode redirects
> > the blocks of the files of a given directory to a particular set of
> > datanodes?
> >
> > Our case is the following:
> >
> >  - Servers
> >     10 x mass storage servers of each 50TB, RAID6
> >        -> 500TB available for hdfs
> >     30 x working nodes with 6TB (no RAID)
> >        -> 180TB available for hdfs
> >
> > We'd like that the files stored in
> >    /hdfs/ms
> > be hosted on the mass storage (ms) servers,
> > while the files in
> >   /hdfs/wn
> > be hosted on the working nodes (wn).
>
> Why do you want to partition your datanodes this way? In our cluster,
> datanode size ranges from ~1 TB (no RAID) up to ~80TB (RAID). While in an
> ideal world all datanodes would be similar in configuration, we have not
> observed any issues with this arrangement in production.


Mime
View raw message