hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elton sky <eltonsky9...@gmail.com>
Subject Re: BUG: Anyone use block size more than 2GB before?
Date Fri, 22 Oct 2010 04:47:14 GMT
Milind,

You are right. But that only happens when your client is one of the data
nodes in HDFS. otherwise a random node will be picked up for the first
replica.

On Fri, Oct 22, 2010 at 3:37 PM, Milind A Bhandarkar
<milindb@yahoo-inc.com>wrote:

> If a file of say, 12.5 GB were produced by a single task with replication
> 3, the default replication policy will ensure that the first replica of each
> block will be created on local datanode. So, there will be one datanode in
> the cluster that contains one replica of all blocks of that file. Map
> placement hint specifies that node.
>
> It's evil, I know :-)
>
> - Milind
>
> On Oct 21, 2010, at 1:30 PM, Alex Kozlov wrote:
>
> > Hmm, this is interesting: how did it manage to keep the blocks local?
>  Why
> > performance was better?
> >
> > On Thu, Oct 21, 2010 at 11:43 AM, Owen O'Malley <omalley@apache.org>
> wrote:
> >
> >> The block sizes were 2G. The input format made splits that were more
> than a
> >> block because that led to better performance.
> >>
> >> -- Owen
> >>
>
> --
> Milind Bhandarkar
> (mailto:milindb@yahoo-inc.com)
> (phone: 408-203-5213 W)
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message