hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yang Chen" <chenyangyinp...@gmail.com>
Subject Re: one input file per map
Date Thu, 03 Jul 2008 16:12:36 GMT
Maybe consider a hierachy. The first level is one map per file, and the
second level is map/reduce for parent level.

YC


On 7/3/08, Jason Venner <jason@attributor.com> wrote:
>
> You could also set your input split size to Long.MAX_VALUE.
>
> Goel, Ankur wrote:
>
>> Nope, But if the intent is so then there are 2 ways of doing it.
>>
>> 1. Just extend the input format of your choice and override
>> isSplitable() method to return false.
>>
>> 2. Compress your text file using a compression format supported by
>> hadoop (e.g gzip). This will ensure that one map task processes 1 file
>> since compressed files are not split between processes.
>>
>>
>> -----Original Message-----
>> From: Qiong Zhang [mailto:jamesz@yahoo-inc.com] Sent: Tuesday, July 01,
>> 2008 9:54 PM
>> To: core-user@hadoop.apache.org
>> Subject: one input file per map
>> Hi,
>>
>>
>> Is there an existing input format/split which supports one input file
>> (e.g. plain text) per map task?
>>
>>
>> Thanks,
>>
>> James
>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message