crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: PipelineResult VS materialize()
Date Wed, 06 Jan 2016 16:19:43 GMT
I added a getPipelineResult() method to the MaterializableIterable in
CRUNCH-400: does it not do what you want?
https://github.com/apache/crunch/commit/ded504eb133fa0814e2d90ff2a662e72a67e04bb

On Wed, Jan 6, 2016 at 6:19 AM, Clément MATHIEU <clement@unportant.info>
wrote:

> Hi,
>
> Most of the pipelines I write rely heavily on counters as a monitoring
> tool:
> DoFns increment counters and the driver is in charge of pushing the final
> value
> into a metric store.
>
> Everything is fine for pipelines relying exclusively on run|runAsync.
>
> Things get a bit uglier when materialize is used:
>
>  - If we stick to the public API, it is not possible to get access to the
>    PipelineResult possibly created by a call to materialize. One has to
> cast
>    the iterable to MaterializeIterable
>
>  - Code dealing with the iterable most likely do not care about counters
>    at all, but must extract the PipelineResult and pass it to someone else
> to
>    not loose it.
>
>
> Would not make it sense to give access to all PipelineResults created the
> pipeline in a central place ?
>
> The pipeline could store them, something like
> https://gist.github.com/cykl/7f71c1a3dff3f881f3ba, or a callback could be
> used.
>
> Regards,
>
> Clément
>
>

Mime
View raw message