incubator-crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Friedrich <>
Subject Re: Generating DOT files for Crunch job plans
Date Sat, 27 Oct 2012 13:39:48 GMT

On Saturday, 2012-10-27, Gabriel Reid wrote:
> In the few times that I've debugged issues in the planner in Crunch,
> it always takes me a bit of time to figure out (again) how things
> work there. I've been thinking/planning of writing some more inline
> docs and doing a bit of refactoring in the code to help myself (and
> others) with doing this in the future, but something else that I was
> thinking of was the generation of DOT[1] files for pipelines so that
> it's easier to visualize what's going on.

That's a great idea, it will help to win prospective users over who
wonder whether Crunch's performs as well as a sequence of hand-written
MR jobs.

There are other ways in Java to generate graphs, BTW, but from my
experience none of them produces output that matches dot/graphviz. In
my opinion we shouldn't run dot ourselves though, because most users
don't have dot installed. just generate the output and let users call
dot themselves.
> I'm sure that functionality like this can be useful (at least to me,
> as I was just using it in a somewhat ad-hoc way to debug
> CRUNCH-102), but I'm not sure if this is something we want to expose
> easily, or keep pretty hidden to just use for debugging. I believe
> Pig provides this same functionality with the "explain" command.
> Any thoughts on adding this, particularly around how we could/should
> expose it in the API?
I think we should make it available for users and make it really easy
to access it. I'm not sure about the API, though. Since it's really
cheap to create we could always generate dot output, store it inside
the Configuration instance and provide a static utility class to
access it? A while ago we discussed moving debugging/log4j manipulation
logic out of the MRPipeline, perhaps we can use a single CrunchDebug
utilty for both.


View raw message