crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-470) Add hdfs/yarn minicluster crunch pipeline
Date Thu, 11 Sep 2014 11:16:34 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14129890#comment-14129890
] 

Gabriel Reid commented on CRUNCH-470:
-------------------------------------

Do you mean the addition of a new Pipeline implementation (in addition to MemPipeline, MRPipeline,
and SparkPipeline)? The MRPipeline implementation will already run on YARN as long as Crunch
is compiled for hadoop2, so there shouldn't be a new Pipeline impl needed for this.

On the other hand, if you're referring to testing pipelines on a pseudo-distributed mini cluster,
that is already possible -- this is what's actually done in the HFileTargetIT integration
test, a mini-cluster (with HDFS, etc) is spun up and the pipeline is run there.

> Add hdfs/yarn minicluster crunch pipeline
> -----------------------------------------
>
>                 Key: CRUNCH-470
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-470
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.3
>            Reporter: Rafal Wojdyla
>            Assignee: Josh Wills
>            Priority: Minor
>
> Crunch currently has two pipelines:
> * MemPipeline
> * MRPipeline
> MemPipeline is in-memory pipelines based on local in-memory mapreduce mode.
> MRPipeline is distributed pipeline based on distributed MapReduce.
> Using HDFS/YARN Minicluster it's possible to better emulate Hadoop cluster, and it could
be a 'final test' before running on the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message