hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-403) close method in a Mapper should be provided with OutputCollector and a Reporter
Date Wed, 02 Aug 2006 21:14:15 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-403?page=comments#action_12425381 ] 
            
Owen O'Malley commented on HADOOP-403:
--------------------------------------

I'm very uncomfortable with passing the Reporter and OutputCollector via the JobConf.

It does two bad things:
  1. It passes "real" Java objects around in the JobConf, which breaks the assumption that
the JobConf can be serialized successfully. (In this case, it is ok because it won't cross
the process boundary, but it breaks the developers expectations.)
  2. It hides the information that application writers need in a very hidden place. If I look
at Mapper or Reducer, I won't see the information that I have available. Only if I scan through
the HUGE JobConf API will I see the fact that they are avaiable.

I strongly suggest that we just take the hit and extend the Closeable interface. I'd propose:
  1. Making the Closeable.close() method depricated.
  2. Add a new Closeable.close(OutputCollector, Reporter) method.
  3. In MapReduceBase provide a default implementation that calls the close() method.

That should minimize the breakage in user code and still make the intended interface clear.

> close method in a Mapper should be provided with OutputCollector and a Reporter
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-403
>                 URL: http://issues.apache.org/jira/browse/HADOOP-403
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.5.0
>         Environment: all
>            Reporter: Milind Bhandarkar
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.5.0
>
>
> For mappers with side-effects, or mappers that work as aggregators (i.e. no output on
individual key-value pairs, but an aggregate output at the end of all key-value pairs), output
should be performed in the close method. For this purpose, we need to supply output collector
and reporter to the close method of Mapper. This involves interface change, though. Thoughts
?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message