crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-441) Crunch on Tez
Date Wed, 23 Jul 2014 13:11:38 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071690#comment-14071690
] 

Gabriel Reid commented on CRUNCH-441:
-------------------------------------

Thanks for the info [~davw]. FWIW, I took a look at this a while back as well, and I'd say
that what your summary is pretty much exactly in line with what I found at the time: 
* more complex to get started than expected
* due to a lack of API stability and documentation, now probably isn't the best time (or at
least not the easiest time) to attempt to do it

One additional interesting thing that Tez has is the concept of edge properties. I suppose
an initial implementation of Crunch on Tez would always use the "scatter-gather" edge property,
but I'm thinking there could be some interesting ways to use other edge properties in Crunch.
On the other hand, this would probably also mean that there would be some changes needed in
the Crunch API in order to allow taking advantage of edge properties.

> Crunch on Tez
> -------------
>
>                 Key: CRUNCH-441
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-441
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: David Whiting
>
> Tez is potentially a better drop-in replacement for MR than Spark on many existing Hadoop
environments, because it doesn't require always-on resources and is less memory-hungry than
Spark whilst still providing huge performance gains as can be seen in new versions of Hive.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message