crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-235) Avoid exposing incompatible Hadoop classes in Crunch API
Date Fri, 05 Jul 2013 15:33:49 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700935#comment-13700935
] 

Tom White commented on CRUNCH-235:
----------------------------------

The TaskAttemptContext stuff is a problem, but it should only be a problem for Crunch itself,
not for Crunch apps. This change would help a bit by removing the leakage of incompatible
classes into Crunch apps. I agree that you'd have to select the right Crunch JAR, although
that could be done at runtime, without any recompilation.

I did look to see whether the approach we recently used in Parquet to produce a single JAR
would work in Crunch (https://github.com/Parquet/parquet-mr/pull/32#issuecomment-17283008),
but as you point out, the Crunch code is too performance critical to use the reflection-based
approach there. In Parquet it's not a problem since it's only per-task or per-job calls to
get Configuration that need to be done reflectively. In Crunch, OutputEmitter calls write()
for every record, so it's not a good candidate for reflection.
                
> Avoid exposing incompatible Hadoop classes in Crunch API
> --------------------------------------------------------
>
>                 Key: CRUNCH-235
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-235
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: CRUNCH-235.patch
>
>
> Between Hadoop 1 and 2, org.apache.hadoop.mapreduce.Counter changed from a class to an
interface. Therefore, exposing Counter in Crunch's API means that Crunch programs may need
to be recompiled when moving from a Hadoop 1 to a Hadoop 2 cluster. It would be nice to avoid
the need to recompile. 
> Note that Crunch itself has two artifacts - one for each major version of Hadoop - and
the change proposed here would not alter that, it would just mean that the same Crunch program
binary could be used with either Crunch artifact.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message