hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Cox" <rick....@gmail.com>
Subject Re: Question about input file breakdown
Date Mon, 15 Oct 2007 16:57:29 GMT
You can also gzip each input file. Hadoop will not split a compressed
input file (but will automatically decompress it before feeding it to
your mapper).

rick

On 10/15/07, Ted Dunning <tdunning@veoh.com> wrote:
>
>
> Use a list of file names as your map input.  Then your mapper can read a
> line, use that to open and read a file for processing.
>
> This is similar to the problem of web-crawling where the input is a list of
> URL's.
>
> On 10/15/07 6:57 AM, "Ming Yang" <minghsien@gmail.com> wrote:
>
> > I was writing a test mapreduce program and noticed that the
> > input file was always broken down into separate lines and fed
> > to the mapper. However, in my case I need to process the whole
> > file in the mapper since there are some dependency between
> > lines in the input file. Is there any way I can achieve this --
> > process the whole input file, either text or binary, in the mapper?
>
>

Mime
View raw message