hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hammerbacher (JIRA)" <>
Subject [jira] Commented: (HIVE-1107) Generic parallel execution framework for Hive (and Pig, and ...)
Date Wed, 17 Nov 2010 22:15:17 GMT


Jeff Hammerbacher commented on HIVE-1107:

Okay, thanks. Let me try to pull apart the issues so that I can understand them:

bq. Oozie is more complex than Pig and HIVE put together Compare their manuals, both in terms
of length and readability.

bq. Oozie is (nearly?) turing complete XML, not easily human readable script, and scheduling
one job takes far too much of it.

bq. Also, there is no need to force Oozie either, people can use Azkaban etc. for workflow.

Each of these objects seem moot, given that Oozie would be targeted by the Hive and Pig developers,
not the Hive and Pig users. No Hive or Pig user would be required to write Oozie: the configuration
files would be generated by the Hive and Pig query planners, from my understanding.

bq. I believe, mid-to-long term, that Pig/Hive will get significantly smarter about the way
they construct MR jobs - they will want to run some of the nodes in the DAG, wait for their
output (e.g. a sampler) and then make ever more complicated decisions to modify the DAG. I
believe Oozie isn't the right tool to be using for this purpose.

Adaptive query optimization is indeed a noble goal. Oozie seems to think at the level of workflow
rather than dataflow, so as you say, it may not be an appropriate layer for performing these
optimizations. I'm not sure if it detracts from the ability of Hive or Pig to perform adaptive
query optimization though, either.

Anyways, thanks for the discussion. We're certainly thinking through these issues as well.

> Generic parallel execution framework for Hive (and Pig, and ...)
> ----------------------------------------------------------------
>                 Key: HIVE-1107
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Carl Steinbach
> Pig and Hive each have their own libraries for handling plan execution. As we prepare
to invest more time improving Hive's plan execution mechanism we should also start to consider
ways of building a generic plan execution mechanism that is capable of supporting the needs
of Hive and Pig, as well as other Hadoop data flow programming environments. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message