hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3702) add support for chaining Maps in a single Map and after a Reduce [M*/RM*]
Date Tue, 05 Aug 2008 10:50:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12619840#action_12619840
] 

Enis Soztutar commented on HADOOP-3702:
---------------------------------------

bq. On #1, Yes but this introduces a backwards incompatibility as there is code out there
(outside of core) that uses the write(OutputStream) with DataOuputStream instances and the
change is breaking such usages (as the previous patch found).

Yes, at some level it will introduce backwards incompatibility. But thinking in abstract terms,
having {{func(Interface1 i1)}}, and introducing {{func(Interface2 i2)}} is not a direct incompatibility.
Only those calls where the object both implements Interface1 and Interface2 would be affected(in
this case DataOutputStream), and there is a clear workaround for this. I think introducing
a Serialization for Configuration is far more worse. I suggest we keep write(OutputStream),
introduce write(DataOutput), fix all the cases where DataOutputStream is passed (including
contrib), and mark this change as incompatible with clear documentation in release note. 


bq. On #2, I'm missing something here, under what circumstances would it make sense to use
the Chain* classes with generics that would be checked at compile time? If it is just a way
of avoiding the @SuppressWarnings annotations I'd prefer the anotations as, IMO, they are
meant for this cases.
@SuppressWarnings annotations are just "hacks" for the compiler to stop complaining. There
are valid reasons for the compiler to issue warnings, and instead of fixing them, we say the
compiler to ignore these, which is not desired. 
The motivation to use generics is the same as the one for the use of generics in Mapper, Reducer,
etc. I guess with some little extra effort, we could make this change, no? 

> add support for chaining Maps in a single Map and after a Reduce [M*/RM*]
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-3702
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3702
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all
>            Reporter: Alejandro Abdelnur
>            Assignee: Alejandro Abdelnur
>         Attachments: patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt,
patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt, patch3702.txt
>
>
> On the same input, we usually need to run multiple Maps one after the other without no
Reduce. We also have to run multiple Maps after the Reduce.
> If all pre-Reduce Maps are chained together and run as a single Map a significant amount
of Disk I/O will be avoided. 
> Similarly all post-Reduce Maps can be chained together and run in the Reduce phase after
the Reduce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message