crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clément MATHIEU <clem...@unportant.info>
Subject PipelineResult VS materialize()
Date Wed, 06 Jan 2016 14:19:13 GMT
Hi,

Most of the pipelines I write rely heavily on counters as a monitoring 
tool:
DoFns increment counters and the driver is in charge of pushing the 
final value
into a metric store.

Everything is fine for pipelines relying exclusively on run|runAsync.

Things get a bit uglier when materialize is used:

  - If we stick to the public API, it is not possible to get access to 
the
    PipelineResult possibly created by a call to materialize. One has to 
cast
    the iterable to MaterializeIterable

  - Code dealing with the iterable most likely do not care about counters
    at all, but must extract the PipelineResult and pass it to someone 
else to
    not loose it.


Would not make it sense to give access to all PipelineResults created 
the
pipeline in a central place ?

The pipeline could store them, something like
https://gist.github.com/cykl/7f71c1a3dff3f881f3ba, or a callback could 
be used.

Regards,

Clément


Mime
View raw message