crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-420) Breakpoints Not Working
Date Tue, 17 Jun 2014 20:41:03 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14034330#comment-14034330
] 

Josh Wills commented on CRUNCH-420:
-----------------------------------

Yep, reading over it now. So it seems like we have two situations where breakpointing is needed
(I may have this wrong, but I'm going to try to write it up):

1) We have two dependent GBK operations, and we want to signal to the planner where to split
in between them, which is handled by CRUNCH-294.
2) We have a single data prep step that is going to feed multiple downstream GBKs. We don't
want to run it twice in separate jobs (either b/c it's compute intensive, or b/c it does an
amazing job of filtering a large output file), so we mark it as materialized and have it get
created in a single map-only job that then feeds the downstream GBKs, which is handled by
this patch.

Is there another breakpoint situation I'm missing? Is there a reduce-side version of this
problem?

> Breakpoints Not Working
> -----------------------
>
>                 Key: CRUNCH-420
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-420
>             Project: Crunch
>          Issue Type: Bug
>         Environment: Crunch 0.8.2
>            Reporter: Allan Shoup
>            Assignee: Josh Wills
>         Attachments: Breakpoint2IT.java, CRUNCH-420.patch, testBreakpoint_plan.png
>
>
> Reading through CRUNCH-294, it looks like materialize is supposed to function as a breakpoint
to the planner. I've seen several plans where it appeared to me a particular DoFn shouldn't
have been repeated, but it was.
> I'll attach some supporting material.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message