hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Zhang" <richardtec...@gmail.com>
Subject Re: InputSplit boundaries
Date Thu, 26 Jun 2008 23:11:29 GMT
The file system block size is the upper bound of the split size. The min
split size can be set up by users.

On Thu, Jun 26, 2008 at 12:23 PM, Naama Kraus <naamakraus@gmail.com> wrote:

> Thanks for the input. Naama
>
> On Thu, Jun 26, 2008 at 2:12 PM, Amar Kamat <amarrk@yahoo-inc.com> wrote:
>
> > Naama Kraus wrote:
> >
> >> Hi,
> >>
> >> I have a question regarding InputSplit boundaries. Does an InputSplit
> >> necessarily fall within a single file system block boundaries ?
> >>
> > No.
> >
> >> Or can it
> >> span across blocks ?
> >>
> > Yes. It can span across blocks.
> >
> >> In particular, what about a FileSplit ?
> >> If it spans among blocks, could the blocks reside in different machines
> ?
> >>
> > Yes.
> >
> >> If
> >> so, how would it effect locality of computations ?
> >>
> >>
> > The remaining blocks get pulled to the machine that is executing the
> task.
> > Afaik the blocks are streamed while the map task is getting executed and
> > hence there is some amount of parallelism there. FYI there is one
> > optimization filed on this. See
> > https://issues.apache.org/jira/browse/HADOOP-3293.
> > Amar
> >
> >> Thanks, Naama
> >>
> >>
> >>
> >
> >
>
>
> --
> oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
> 00 oo 00 oo
> "If you want your children to be intelligent, read them fairy tales. If you
> want them to be more intelligent, read them more fairy tales." (Albert
> Einstein)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message