hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: HDFS, client caches and transfer speeds
Date Tue, 07 Jul 2009 19:49:58 GMT
william kinney wrote:
> Hi,
> I have a high rate of data coming in that I'm constantly writing to my
> HDFS (say 2MB/s). I tested even higher rates (70 MB/s) and was
> surprised that it was able to perform so well (via 8 threads in a
> multi-core machine).
> However, I just found this note in the HDFS docs: "In fact, initially
> the HDFS client caches the file data into a temporary local file".

This doc is out dated. It does not write to temporary local file.

As Ted mentioned, if your client is also (I suspect not) a datanode, 
then currently HDFS writes one replica to the local datanode.

> Does that mean the rates that I'm seeing are not the rates of which
> files are copied into the HDFS, but rather the rate to which my hdfs
> client is just copying to /tmp ?
> And if so, if I had a much higher rate (e.g. 70 MB/s), wouldn't I see
> potential issues in the HDFS client trying to keep up w/ the local
> copy?
> Thanks,
> Will

View raw message