crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: .materialize() returns empty collection on pipeline error?
Date Wed, 28 Jan 2015 19:35:05 GMT
Yeah, that's not good. Will file a JIRA to revert-- sorry about that,
everybody.

J

On Wed, Jan 28, 2015 at 11:27 AM, Allan Shoup <allan.shoup@gmail.com> wrote:

> I also prefer the exception. I recently found out about this behavior and
> am now facing a task of going back through my code base to explicitly add
> special error handling to detect this case.
>
> On Wed, Jan 28, 2015 at 1:21 PM, David Whiting <davidwhiting@gmail.com>
> wrote:
>
> > I think "fail catastrophically" is probably exactly what should happen
> > here. You can always catch and use an empty iterable if it fails. A
> common
> > use case here is to do one step, materialize it into a collection or map,
> > then pass that into a DoFn to use as a small lookup table. This failure
> > mode means that future steps silently continue to execute with empty
> lookup
> > tables as part of their processing on the cluster.
> >
> > On 28 January 2015 at 13:45, Josh Wills <jwills@cloudera.com> wrote:
> >
> > > Yeah, I think that before, we would just fail catastrophically by
> > throwing
> > > a CrunchRuntimeException, which I found annoying. Do you prefer that
> > > behavior? It's certainly something that could be configurable.
> > >
> > > J
> > >
> > > On Wed, Jan 28, 2015 at 10:36 AM, Jinal Shah <jinalshah2007@gmail.com>
> > > wrote:
> > >
> > > > I think it was intented from these commits I see here
> > > >
> > > >
> > >
> >
> https://github.com/apache/crunch/commit/3711cea61bded4c90b235a01163ae5f855089917
> > > > and
> > > >
> > > >
> > >
> >
> https://github.com/apache/crunch/commit/ded504eb133fa0814e2d90ff2a662e72a67e04bb
> > > > .
> > > > Josh can enhance on this more.
> > > >
> > > > On Wed, Jan 28, 2015 at 9:26 AM, Mārtiņš Kalvāns <
> > > > martins.kalvans@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi.
> > > > >
> > > > > When pipeline fails on cluster with some exception, materialize()
> > > returns
> > > > > empty collection and just logs error message.
> > > > >
> > > > > I'm (very, very) puzzled about this behaviour:
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/crunch/blob/master/crunch-core/src/main/java/org/apache/crunch/materialize/MaterializableIterable.java#L92
> > > > > Is this really intended behaviour?
> > > > >
> > > > > If so, then some documentation for materialize() function about
> this
> > > > > behaviour would be really nice to have. :)
> > > > >
> > > > >
> > > > > --
> > > > > Mārtiņš
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Director of Data Science
> > > Cloudera <http://www.cloudera.com>
> > > Twitter: @josh_wills <http://twitter.com/josh_wills>
> > >
> >
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message