hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Wang <andrew.w...@cloudera.com>
Subject Re: Regarding moving specific blocks of data in HDFS
Date Wed, 18 Dec 2013 23:23:27 GMT
Hi Karthiek,

I haven't checked 1.0.4, but in 2.2.0 and onwards, there's this setting you
can tweak up:

dfs.datanode.balance.bandwidthPerSec

By default, it's set to just 1MB/s, which is pretty slow. Again at least in
2.2.0, there's also `hdfs dfsadmin -setBalancerBandwidth` which can be used
to adjust this config property at runtime.

Best,
Andrew


On Wed, Dec 18, 2013 at 2:40 PM, Karthiek C <karthiekc@gmail.com> wrote:

> Hi all,
>
> I am working on a research project where we are looking at algorithms to
> "optimally" distribute data blocks in HDFS nodes. The definition of what is
> optimal is omitted for brevity.
>
> I want to move specific blocks of a file that is *already* in HDFS. I am
> able to achieve it using data transfer protocol (took cues from "Balancer"
> module). But the operation turns out to be very time consuming. In my
> cluster setup, to move 1 block of data (approximately 60 MB) from
> data-node-1 to data-node-2 it takes nearly 60 seconds. A "dfs -put"
> operation that copies the same file from data-node-1's local file system to
> data-node-2 takes just 1.4 seconds.
>
> Any suggestions on how to speed up the movement of specific blocks?
> Bringing down the running time is very important for us because this
> operation may happen while executing a job.
>
> I am using hadoop-1.0.4 version.
>
> Thanks in advance!
>
> Best,
> Karthiek
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message