crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Whiting <davidwhit...@gmail.com>
Subject Re: .materialize() returns empty collection on pipeline error?
Date Wed, 28 Jan 2015 19:21:34 GMT
I think "fail catastrophically" is probably exactly what should happen
here. You can always catch and use an empty iterable if it fails. A common
use case here is to do one step, materialize it into a collection or map,
then pass that into a DoFn to use as a small lookup table. This failure
mode means that future steps silently continue to execute with empty lookup
tables as part of their processing on the cluster.

On 28 January 2015 at 13:45, Josh Wills <jwills@cloudera.com> wrote:

> Yeah, I think that before, we would just fail catastrophically by throwing
> a CrunchRuntimeException, which I found annoying. Do you prefer that
> behavior? It's certainly something that could be configurable.
>
> J
>
> On Wed, Jan 28, 2015 at 10:36 AM, Jinal Shah <jinalshah2007@gmail.com>
> wrote:
>
> > I think it was intented from these commits I see here
> >
> >
> https://github.com/apache/crunch/commit/3711cea61bded4c90b235a01163ae5f855089917
> > and
> >
> >
> https://github.com/apache/crunch/commit/ded504eb133fa0814e2d90ff2a662e72a67e04bb
> > .
> > Josh can enhance on this more.
> >
> > On Wed, Jan 28, 2015 at 9:26 AM, Mārtiņš Kalvāns <
> > martins.kalvans@gmail.com>
> > wrote:
> >
> > > Hi.
> > >
> > > When pipeline fails on cluster with some exception, materialize()
> returns
> > > empty collection and just logs error message.
> > >
> > > I'm (very, very) puzzled about this behaviour:
> > >
> > >
> >
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/materialize/MaterializableIterable.java#L92
> > > Is this really intended behaviour?
> > >
> > > If so, then some documentation for materialize() function about this
> > > behaviour would be really nice to have. :)
> > >
> > >
> > > --
> > > Mārtiņš
> > >
> >
>
>
>
> --
> Director of Data Science
> Cloudera <http://www.cloudera.com>
> Twitter: @josh_wills <http://twitter.com/josh_wills>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message