crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Friedrich <>
Subject Re: About status web page
Date Wed, 27 Feb 2013 07:30:07 GMT
On Wednesday, 2013-02-27, Chao Shi wrote:
> I'm developing a complex pipeline (30+ MRs plus lots of joins). I have a
> hard time to understand which part of the pipeline spends most running time
> and how much intermediate output does it produce. Crunch's optimization
> work is great, but it makes the execution plan difficult to be understood.
> Each time I modified the pipeline, I have to dump the dot file and run
> graphviz to generate a new picture and examine if there's anything wrong.
> About security, I'm not familiar with how Hadoop does it. I will try to
> reuse hadoop's HttpServer (does it have something to do with security?).
> The bottom line is to make this feature disabled by default, and let users
> enable it at their own risk.

OK, sounds good.

> If this feature is enabled, the user can choose to use unused port or
> specified port. I haven't got an idea that how the user know the randomly
> picked port (via log?) . I will be working on a prototype version first,
> and see if the status page is generally useful.

Yeah, logging the URL would probably be the only thing that works. Not
counting fancy stuff like MDNS ;-)

In my opinion, we should try to get this done with the dependencies that
we already get through Hadoop. Each additional library we add to Crunch
will cause interoperability problems for someone.


View raw message