crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-420) Breakpoints Not Working
Date Tue, 17 Jun 2014 20:41:03 GMT


Josh Wills commented on CRUNCH-420:

Yep, reading over it now. So it seems like we have two situations where breakpointing is needed
(I may have this wrong, but I'm going to try to write it up):

1) We have two dependent GBK operations, and we want to signal to the planner where to split
in between them, which is handled by CRUNCH-294.
2) We have a single data prep step that is going to feed multiple downstream GBKs. We don't
want to run it twice in separate jobs (either b/c it's compute intensive, or b/c it does an
amazing job of filtering a large output file), so we mark it as materialized and have it get
created in a single map-only job that then feeds the downstream GBKs, which is handled by
this patch.

Is there another breakpoint situation I'm missing? Is there a reduce-side version of this

> Breakpoints Not Working
> -----------------------
>                 Key: CRUNCH-420
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>         Environment: Crunch 0.8.2
>            Reporter: Allan Shoup
>            Assignee: Josh Wills
>         Attachments:, CRUNCH-420.patch, testBreakpoint_plan.png
> Reading through CRUNCH-294, it looks like materialize is supposed to function as a breakpoint
to the planner. I've seen several plans where it appeared to me a particular DoFn shouldn't
have been repeated, but it was.
> I'll attach some supporting material.

This message was sent by Atlassian JIRA

View raw message