crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-235) Avoid exposing incompatible Hadoop classes in Crunch API
Date Fri, 05 Jul 2013 15:33:49 GMT


Tom White commented on CRUNCH-235:

The TaskAttemptContext stuff is a problem, but it should only be a problem for Crunch itself,
not for Crunch apps. This change would help a bit by removing the leakage of incompatible
classes into Crunch apps. I agree that you'd have to select the right Crunch JAR, although
that could be done at runtime, without any recompilation.

I did look to see whether the approach we recently used in Parquet to produce a single JAR
would work in Crunch (,
but as you point out, the Crunch code is too performance critical to use the reflection-based
approach there. In Parquet it's not a problem since it's only per-task or per-job calls to
get Configuration that need to be done reflectively. In Crunch, OutputEmitter calls write()
for every record, so it's not a good candidate for reflection.
> Avoid exposing incompatible Hadoop classes in Crunch API
> --------------------------------------------------------
>                 Key: CRUNCH-235
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: CRUNCH-235.patch
> Between Hadoop 1 and 2, org.apache.hadoop.mapreduce.Counter changed from a class to an
interface. Therefore, exposing Counter in Crunch's API means that Crunch programs may need
to be recompiled when moving from a Hadoop 1 to a Hadoop 2 cluster. It would be nice to avoid
the need to recompile. 
> Note that Crunch itself has two artifacts - one for each major version of Hadoop - and
the change proposed here would not alter that, it would just mean that the same Crunch program
binary could be used with either Crunch artifact.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message