hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Luo <lgpub...@yahoo.com.cn>
Subject Re: return in map
Date Sun, 06 Dec 2009 23:55:40 GMT
Thanks for reponse.

It seems there is something wrong in my logic. I kind of solve it now. What I am still unsure
of is how to return or exit in a mapreduce program. If I want to skip one line (because it
doesn't satisfy some constrains, for example), use return to quit map function is enough.
But what if I want to quit a map task (due to some error I detect, for example, the file I
want to read doesn't exist)? if use System.exit(), hadoop will try to run it again. Similarly,
if I catch an exception and I want to quit the current task, what should I do? 


----- 原始邮件 ----
发件人: Edmund Kohlwey <ekohlwey@gmail.com>
收件人: common-user@hadoop.apache.org
发送日期: 2009/12/6 (周日) 10:52:40 上午
主   题: Re: return in map

Let me see if I understand:
The mapper is reading lines in a text file. You want to see if a single
line meets a given criteria, and emit all the lines whose index is
greater than or equal to the single matching line's index. I'll assume
that if more than one line meets the criteria, you have a different
condition which you will handle appropriately.

First some discussion of your input- is this a single file that should
be considered as a whole? In that case, you probably only want one
mapper, which, depending on your reduce task, may totally invalidate the
use case for MapReduce. You may just want to read the file directly from
HDFS and write to HDFS in whatever application is using the data.

Anyways, here's how I'd do it. In setup, open a temporary file (it can
be directly on the node, or on HDFS, although directly on the node is
preferable). Use map to perform your test, and keep a counter of how
many lines match. After the first line matches, begin saving lines. If a
second line matches, log the error condition or whatever. In cleanup, if
only one line matched, open your temp file and begin emitting the lines
you saved from earlier.

There's a few considerations in your implementation:
1. File size. If the temporary file exceeds the available space on a
mapper, you can make a temp file in HDFS but this is far from ideal.
2. As noted above, if there's a single mapper and no need to sort or
reduce the output, you probably want to just implement this as a program
that happens to be using HDFS as a data store, and not bother with
MapReduce at all.

On 12/6/09 10:03 AM, Sonal Goyal wrote:
> Hi,
> Maybe you could post your code/logic for doing this. One way would be to set
> a flag once your criteria is met and emit keys based on the flag.
> Thanks and Regards,
> Sonal
> 2009/12/5 Gang Luo <lgpublic@yahoo.com.cn>
>> Hi all,
>> I got a tricky problem. I input a small file manually to do some filtering
>> work on each line in map function. I check if the line satisfy the constrain
>> then I output it, otherwise I return, without doing any other work below.
>> For the map function will be called on each line, I think the logic is
>> correct. But it doesn't work like this. If there are 5 line for a map task,
>> and only the 2nd line satisfies the constrain, then the output will be line
>> 2, 3, 4, and 5. If the 3rd line satisfies, then output will be line 3, 4, 5.
>> It seems that once a map task meet the first satisfying line, the filter
>> doesn't work for following lines.
>> It is interesting problem. I am checking it now. I also hope someone could
>> give me some ideas on this. Thanks.
>> -Gang
>>      ___________________________________________________________
>>  好玩贺卡等你发,邮箱贺卡全新上线!
>> http://card.mail.cn.yahoo.com/


View raw message