hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Seunghwa Kang <s.k...@gatech.edu>
Subject Re: Data-local map tasks lower than Launched map tasks even with full replication
Date Fri, 17 Jul 2009 23:16:29 GMT
I checked with

bin/hadoop fs -stat "%n %r" input/*

part-00000 4
part-00001 4
part-00002 4
part-00003 4
part-00004 4
part-00005 4
part-00006 4
part-00007 4

and see replication factor is 4.

Also, I set replication factor to 4 in hadoop-site.xml, run stop-all.sh
and start-all.sh, re-load the data, and re-run the code but still
getting the same result.

I am searching for hadoop-default.xml and find 

Specifies the maximum amount of bandwidth that each datanode
can utilize for the balancing purpose in term of
the number of bytes per second.

1048576 is 1 GB/s and seems like higher than 1 Gbit/s for my nodes. I am
going to change this value and see what happens. 

Any other suggestions?

Thank you very much,


On Fri, 2009-07-17 at 16:07 -0700, Ted Dunning wrote:
> Does [hadoop fs -fsck /] show any under-replicated files/blocks?  you
> may not waited long enough after increasing the target replication
> rate.
> Another thing to watch out for in a production node is the
> distribution of node blocks.  You should be careful to load data from
> outside the cluster to ensure random placement of file blocks.  That
> is critical for getting good locality.  This obviously doesn't apply
> to your situation with 4 replicas on 4 nodes.
> Todd's comment about -setrep is also very important to note.
> On Fri, Jul 17, 2009 at 3:57 PM, Seunghwa Kang <s.kang@gatech.edu>
> wrote:
>         Just for test purpose, I increase the replication factor to 4,
>         and check
>         that input data actually has replication factor of 4 with
>         'hadoop fs
>         -stat %r%n' but find that the ratio is still around 80% for 4
>         nodes.

View raw message