Okay, thanks. I -hoped- it was this way.
Sadly, all my files are small (the largest are around 40MB). But oh well!
-j
On May 10, 2011, at 10:46 AM, Matthew Foley wrote:
> Will's right, meta-data transactions go through the Namenode, but all the content data
> read/write activity is directly between Clients and Datanodes, and replication activity
is
> Datanode-to-Datanode. No bottlenecks, as long as your Namenode has enough ram to
> hold the namespace in memory, and enough cores to handle a modestly high transaction
> rate.
>
> And if the individual data files are large (Hadoop-scale "large", that is :-) ), you
can even
> decrease the meta-data/data ratio by increasing the block size from the default 32MB
> to 64MB or even 128MB.
>
> --Matt
>
>
> On May 10, 2011, at 6:03 AM, Will Maier wrote:
>
> Hi Jonathan-
>
> On Tue, May 10, 2011 at 05:50:03AM -0700, Jonathan Disher wrote:
>> I will preface this with a couple statements: a) it's almost 6am, and I've
>> been up all night b) I'm drugged up from an allergic reaction, so I may not be
>> firing on all 64 bits.
>>
>> Do I correctly understand the HDFS architecture in that the namenode is a
>> network bottleneck into the system? I.e., it doesn't really matter how many
>> ethernet interfaces I roll into my data nodes, I will always be limited in
>> how much traffic I can drive to the HDFS pool by the network capacity of the
>> namenode?
>
> No. This diagram should help:
>
> http://hadoop.apache.org/hdfs/docs/current/hdfs_design.html#NameNode+and+DataNodes
>
> The Namenode is a single point of failure, not (under most imaginable
> conditions) a bottleneck.
>
>> I am trying to move a -lot- of data, and i'd like to not throttle the namenode
>> (especially in the old cluster, where I cannot just bond up more interfaces).
>> If there's a way to spread the inbound network (for block writes) traffic I'd
>> love to hear it.
>
> During our (highly distributed) migration, we were writing into HDFS at up to 5 GB/s.
> The more datanodes and writers you have, the faster your aggregate throughput.
>
> --
>
> Will Maier - UW High Energy Physics
> cel: 608.438.6162
> tel: 608.263.9692
> web: http://www.hep.wisc.edu/~wcmaier/
|