hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell Jurney (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1107) Generic parallel execution framework for Hive (and Pig, and ...)
Date Thu, 15 Jul 2010 17:57:50 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888870#action_12888870
] 

Russell Jurney commented on HIVE-1107:
--------------------------------------

At Jeff's suggestion, my comments on this ticket for Hive and Pig follow.

Oozie has been suggested as a solution to this ticket, but it is in my opinion far too complex
to be appropriate for Pig or HIVE.  A scheduler should not be more complex than the language
it schedules, and Oozie is more complex than Pig and HIVE put together.  Compare their manuals,
both in terms of length and readability.  Furthermore, Oozie is (nearly?) turing complete
XML, not easily human readable script, and scheduling one job takes far too much of it.

Pig and HIVE aim to deliver simplicity and accessibility.  In time Oozie may mature, but it
is not there yet.  The features are present, but the open source interface is extremely raw.
 The only simple interface to Oozie is a proprietary GUI.  Perhaps the next major release
will be an improvement.

A tight binding between these projects would cause LinkedIn problems, as we use Azkaban to
schedule pig jobs.  Scheduling a job in Azkaban consists of creating a zip file of your job's
content, inserting a very brief config (typically 3-6 lines), and issuing a one-line command.
 The web interface to Azkaban is free.  This makes it a more appropriate choice for this ticket
than Oozie, but making Azkaban tightly bound to Pig would be a terrible idea too.

We should be very careful about adding enterprise baggage to these tools that is simply not
needed for the vast majority of users.  Convention over configuration is at the core of Pig
and HIVE.  Lets not spoil that.

> Generic parallel execution framework for Hive (and Pig, and ...)
> ----------------------------------------------------------------
>
>                 Key: HIVE-1107
>                 URL: https://issues.apache.org/jira/browse/HIVE-1107
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Carl Steinbach
>
> Pig and Hive each have their own libraries for handling plan execution. As we prepare
to invest more time improving Hive's plan execution mechanism we should also start to consider
ways of building a generic plan execution mechanism that is capable of supporting the needs
of Hive and Pig, as well as other Hadoop data flow programming environments. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message