crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Whiting (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-441) Crunch on Tez
Date Wed, 23 Jul 2014 12:13:38 GMT


David Whiting commented on CRUNCH-441:

After two of us spending a full day on it, we determined the following:

- Tez DAGs map reasonably well to the Graphs build by Crunch's MapReduce implementation, there's
no reason why thing shouldn't be possible in theory.
- Tez's API is much lower level than we expected, meaning that the implementation might well
be more complex than we anticipated. It does appear to have a slightly higher-level Map-Reduce-Reduce
implementation which could make an easier transition for Crunch, but this was difficult to
find information about.
- The Tez API does not yet seem to be particularly stable right now.

We probably won't have time to look at this again for a while, so if anyone wants to take
the baton it'd be really great. There are presumably implementaitons around for similar things
(such is in the Hive source and in a Cascading branch somewhere) that could be used for reference;
otherwise maybe we'll take another look when the API and docs seem a bit more stable and complete.

> Crunch on Tez
> -------------
>                 Key: CRUNCH-441
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: David Whiting
> Tez is potentially a better drop-in replacement for MR than Spark on many existing Hadoop
environments, because it doesn't require always-on resources and is less memory-hungry than
Spark whilst still providing huge performance gains as can be seen in new versions of Hive.

This message was sent by Atlassian JIRA

View raw message