hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qing Yan <qing...@gmail.com>
Subject Re: Hive vs. DryadLINQ
Date Fri, 16 Oct 2009 03:43:28 GMT
Hi Jeff,

Actually I care less about Dryad's implementation - few people will adopt it
today due to its immature and/or proprietary nature. But strictly from the
design and architecture perspective, reading through their literature
makes one feel Dryad has certain edges over Hadoop/Hive.

E.g. Hive treats Hadoop as an execution blackbox, say the hadoop job
involves a large dataset, if partial data error caused the job failure,
there is no easy way for Hive to know the situation and the whole job need
to be re-runned later, vs. in Dryad you get more control and fine tuning
opportunties.

About the implementation of the Dryad model of query execution over HDFS and
underneath HiveQL, the question is
 how much dependency Hive has upon Map/Reduce..  It is probably difficult to
share the same translator/optimizer for Hadoop & Dryad without sacrafing
Dryad's capabilities.We can make Dryad operated only in M/R mode but why
bother:-P



Regards

Qing

On Fri, Oct 16, 2009 at 1:44 AM, Jeff Hammerbacher <hammer@cloudera.com>wrote:

> Hey Qing,
>
> You can download Dryad and see for yourself:
> http://connect.microsoft.com/site/sitehome.aspx?SiteID=891. There's no
> accompanying distributed file system, unfortunately, and I've never seen a
> benchmark of Dryad scaling to more than 300 nodes, so it's not clear that
> it's the "right" model for all workloads. There's certainly room for a
> richer set of physical operators in the Hadoop project, but the nice thing
> about Hadoop and Hive is that it's a full suite of storage, data flow
> execution, and a higher-level syntax that works today at scale. If you'd
> like to try your hand at an implementation of the Dryad model of query
> execution over HDFS and underneath HiveQL, that would certainly be an
> interesting project.
>
> Regards,
> Jeff
>
>
> On Thu, Oct 15, 2009 at 12:31 AM, Qing Yan <qingyan@gmail.com> wrote:
>
>> Hi,
>>
>>    Has anyone looked into the Microsoft Dryad project?
>>
>>    Their basic idea is using DAG(connect computational "vertices" with
>> communication "edges") to model distributed computing flows. And they have
>> something called DryadLINQ which seems to be the Hive equivilent.
>>
>>      Since the DAG model doesn't distingish inter-job(workflow) and
>> intra-job(map/reduce..etc) layer, their approach of doing Query
>> translation,Workflow/Job Scheduling,Execution in one box may score better
>> optimization and fine tuning opportunties compared to the Hadoop/Hive
>> combo.
>>
>>    Also giving majority of the hard work will be encapsulated and
>> performed by the translation/optimizing layer, the simplicity
>> beauty of Map/Reduce becomes irrelevant or even hindrance because
>> it doesn't permit more generic and flexible
>> operations like Dryad does.
>>
>>
>>   Seems M$ got it right this time, at least on paper :-P ...thought?
>>
>>
>>
>>  Qing
>>
>>
>>
>>
>>
>
>

Mime
View raw message