crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anuj Ojha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-400) Materialized jobs should have stage in PipelineResult
Date Tue, 27 May 2014 21:19:04 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010286#comment-14010286
] 

Anuj Ojha commented on CRUNCH-400:
----------------------------------

Hello Josh, below is what we are doing:

{code}
some processing.. Map/Reduce jobs

PCollection hbaseData = getDataFromHbase();

PTable hbaseDataTable = hbaseData.by();

PGroupTable hbasePGroupTable = hbaseDataTable.groupByKey();

PCollection hFileData = hBasePGroupTable.parallelDo("Convert this data to HFile");

writeHFile(hFileData);

PCollection<String> dataToBeMaterialized =  hbasePGroupTable.parallelDo();

Set<String> materializedData= Sets.newHashSet(dataToBeMaterialized.materialized());
{code}

Is this what you are looking for? Or do you need more information regarding this?

> Materialized jobs should have stage in PipelineResult
> -----------------------------------------------------
>
>                 Key: CRUNCH-400
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-400
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9.0, 0.8.2
>            Reporter: Micah Whitacre
>
> Brought up as part of the proposed fix for CRUNCH-272 and on the mailing list[1], a set
of jobs kicked off due to a materialize() call will not be tracked as part of the Pipeline's
stage results returned by the PipelineResult.
> [1] - http://mail-archives.apache.org/mod_mbox/crunch-dev/201405.mbox/%3CCANFazTUAffvTctK5%3DWvW4KyBLSqLCNcke7ZMWwgASu%2BEtkDmyQ%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message