hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1638) Divide MapReduce into API and implementation source trees
Date Fri, 26 Mar 2010 18:58:27 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850303#action_12850303
] 

Tom White commented on MAPREDUCE-1638:
--------------------------------------

To flesh this out a little more: roughly speaking the API tree would contain the public part
of o.a.h.mapred that contains the (deprecated) user API as well as o.a.h.mapreduce. The library
tree would contain o.a.h.mapred.lib and o.a.h.mapreduce.lib (and subpackages). The implementation
tree would contain everything else (although there may be exceptions - classes that are considered
a part of the public API and should go in the API tree).

This change would mark a very clear boundary between the public user-facing API and the internal
implementation. Having separate source trees is a common approach in many projects. The use
of annotations introduced in HADOOP-5073 doesn't provide such a clear demarcation (since you
can't conditionally compile according to the presence of an annotation), but is still useful
for more fine-grained distinctions.

Note that this change would not introduce an incompatible change, since classes would be moved
between trees and remain in the same packages.

I see the following advantages:
* If we created separate JARs for each tree, clients could compile against the API and library
JARs without inadvertently introducing dependencies on implementation class that happen to
be public. Even if the class is marked as InterfaceAudience.Private, it is easy to accidentally
have the IDE pick it up. 
* It makes MAPREDUCE-1478 (shipping modified libraries) easy to implement.
* We can enforce that the kernel (user API) and implementation don't depend on the libraries.
(MAPREDUCE-1453)
* It helps enforce compatibility. From a review standpoint it would be easier to see if a
patch modifies a public API, since it is in its own tree. Similarly, we would publish javadoc
for API and library, and generate JDiff against them. Currently JDiff output is very large
due to a large number of false positives, so it is difficult to see real incompatibility problems.
(HADOOP-6658 helps here, but the approach described in this issue solves the other problems
listed above too.)

Thoughts?

> Divide MapReduce into API and implementation source trees
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-1638
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1638
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: build, client
>            Reporter: Tom White
>            Assignee: Tom White
>
> I think it makes sense to separate the MapReduce source into public API and implementation
trees. The public API could be broken further into kernel and library trees.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message