crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-400) Materialized jobs should have stage in PipelineResult
Date Fri, 06 Jun 2014 23:40:01 GMT


Josh Wills commented on CRUNCH-400:

No, I'm good with understanding the issue, just been busy with the 0.10.0 and 0.8.3 releases.
Will pick this up again next week. In the meantime, the workaround seems simple enough, modulo
my comment above:

Iterable<String> preMaterialized = dataToBeMaterialized.materialize();
PipelineResult res =;
Set<String> materializedData = Sets.newHashSet(preMaterialized);

i.e., if you call materialize(), but don't call iterator() on the returned Iterable object,
then call run(), and only _then_ read the data from the Iterable by calling iterator(), you
will get the counter stats for the materialized object via the PipelineResult that is returned

That said, it seems reasonable that the underlying MaterializableIterable (which is the object
that is returned by materialize() ) would hold on to the PipelineResult that is returned when
it makes a call to run() and allow the client to access it, but even that solution will require
some modification to your existing pipeline code.

> Materialized jobs should have stage in PipelineResult
> -----------------------------------------------------
>                 Key: CRUNCH-400
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.9.0, 0.8.2
>            Reporter: Micah Whitacre
> Brought up as part of the proposed fix for CRUNCH-272 and on the mailing list[1], a set
of jobs kicked off due to a materialize() call will not be tracked as part of the Pipeline's
stage results returned by the PipelineResult.
> [1] -

This message was sent by Atlassian JIRA

View raw message