crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allan Shoup <allan.sh...@gmail.com>
Subject Re: .materialize() returns empty collection on pipeline error?
Date Wed, 28 Jan 2015 19:27:11 GMT
I also prefer the exception. I recently found out about this behavior and
am now facing a task of going back through my code base to explicitly add
special error handling to detect this case.

On Wed, Jan 28, 2015 at 1:21 PM, David Whiting <davidwhiting@gmail.com>
wrote:

> I think "fail catastrophically" is probably exactly what should happen
> here. You can always catch and use an empty iterable if it fails. A common
> use case here is to do one step, materialize it into a collection or map,
> then pass that into a DoFn to use as a small lookup table. This failure
> mode means that future steps silently continue to execute with empty lookup
> tables as part of their processing on the cluster.
>
> On 28 January 2015 at 13:45, Josh Wills <jwills@cloudera.com> wrote:
>
> > Yeah, I think that before, we would just fail catastrophically by
> throwing
> > a CrunchRuntimeException, which I found annoying. Do you prefer that
> > behavior? It's certainly something that could be configurable.
> >
> > J
> >
> > On Wed, Jan 28, 2015 at 10:36 AM, Jinal Shah <jinalshah2007@gmail.com>
> > wrote:
> >
> > > I think it was intented from these commits I see here
> > >
> > >
> >
> https://github.com/apache/crunch/commit/3711cea61bded4c90b235a01163ae5f855089917
> > > and
> > >
> > >
> >
> https://github.com/apache/crunch/commit/ded504eb133fa0814e2d90ff2a662e72a67e04bb
> > > .
> > > Josh can enhance on this more.
> > >
> > > On Wed, Jan 28, 2015 at 9:26 AM, Mārtiņš Kalvāns <
> > > martins.kalvans@gmail.com>
> > > wrote:
> > >
> > > > Hi.
> > > >
> > > > When pipeline fails on cluster with some exception, materialize()
> > returns
> > > > empty collection and just logs error message.
> > > >
> > > > I'm (very, very) puzzled about this behaviour:
> > > >
> > > >
> > >
> >
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/materialize/MaterializableIterable.java#L92
> > > > Is this really intended behaviour?
> > > >
> > > > If so, then some documentation for materialize() function about this
> > > > behaviour would be really nice to have. :)
> > > >
> > > >
> > > > --
> > > > Mārtiņš
> > > >
> > >
> >
> >
> >
> > --
> > Director of Data Science
> > Cloudera <http://www.cloudera.com>
> > Twitter: @josh_wills <http://twitter.com/josh_wills>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message