hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hammerbacher" <jeff.hammerbac...@gmail.com>
Subject Re: Compile Pig and Hive queries to LINQ expression trees?
Date Fri, 31 Oct 2008 04:14:43 GMT
Hey Alan,

The sharing of logical plans seemed like the first place to start on
the way to a shared execution environment. By sharing an API, the
execution environments could be altered under the covers until
matching. The LINQ data model works for Hive, from what I can tell.
It's not clear to me that the Pig data model is not also handled by
LINQ's expressions.

In general, merging execution environments seems fairly tedious, but
sharing the logical plan seems much less difficult. Just wanted to
hear the opinions of others on the topic and hear thoughts on


On Thu, Oct 30, 2008 at 10:05 AM, Alan Gates <gates@yahoo-inc.com> wrote:
> Jeff,
> If I understand your proposal it is that Hive SQL and Pig Latin would both
> compile into LINQ Expression Trees as their logical plans, but continue to
> have separate backends for executing the queries.  Is that correct?
> I'm not seeing the benefit there.  I see the benefit of sharing logical
> plans and a merged backend that can execute both Pig Latin and Hive SQL.
>  These benefits would include focusing more developers on what are probably
> very similar issues that we need to address, plus allowing both our use
> communities to choose which language to express their programs in without
> needing to maintain both systems.  I also see all of the challenges of
> merging two projects, the fact that we have differing data models, etc.
> What do you see as the benefits of sharing just the logical plans?
> Alan.
> On Oct 27, 2008, at 1:16 PM, Jeff Hammerbacher wrote:
>> Hey,
>> There's been some discussion for a while about having a common logical
>> plan format for Pig and Hive to pass to their physical plan
>> generators. Erik Meijer gave a talk recently on LINQ Expression Trees
>> that made me think they would serve as an excellent intermediate data
>> structure. You can read more about them here:
>> http://msdn.microsoft.com/en-us/library/bb882636.aspx, and you can
>> read the proposal for Erik's presentation here [PDF]:
>> http://research.microsoft.com/~emeijer/Papers/Cloud%20computing%20workshop%20proposal%20Draft.pdf.
>> I was planning to save this discussion for a time when I understood
>> the plan structures in both Pig and Hive, but given the discussion
>> around Hive future plans going on right now, I figured now's as good a
>> time as any to get it started.
>> Later,
>> Jeff

View raw message