hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lei Chen" <lch...@gmail.com>
Subject Re: How is big file got divided
Date Thu, 20 Apr 2006 09:04:49 GMT
Thanks, Arbow

I checked the code and also carried out some experiments. It seems that big
file can be split within one line. But the map/reduce will still work
properly since the dfs layer will hide the block layout information from the
map/reduce tasks.

Lei

On 4/20/06, Arbow <avindev@gmail.com> wrote:
>
> Hi, Lei Chen:
>
> You can have a view on org.apache.hadoop.mapred.InputFormatBase, I
> think it will help you.
>
> On 4/20/06, Lei Chen <lchen5@gmail.com> wrote:
> > Hi,
> >      I am a new user of hadoop. This project looks cool.
> >
> >      There is one question about the MapReduce. I want to process a big
> > file. To my understanding, hadoop will partition big file into block and
> > each block is assigned to a worker. Then, how does hadoop decide where
> to
> > cut those big files? Does it guarantee that each line in the input file
> will
> > be assigned to one block and no line will be divided into two parts in
> > different blocks?
> >
> > Lei
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message