hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Makoto Yui <>
Subject Re: [PROPOSAL] Hivemall incubation
Date Fri, 21 Nov 2014 19:50:02 GMT
Hi Nick,

Thank you for the comments.

(2014/11/22 3:42), Nick Dimiduk wrote:
> I would also encourage you to consider joining forces with DataFu,
> rather than "competing". I think there's a real appetite a wholistic
> toolbox of patterns and implementations that can span these projects.
>  From my understanding, there's nothing about DataFu that's unique to
> Pig, they just need the work done to abstract away the Pig bits and
> implement the Hive interfaces.

My current understanding of DataFu is that it is UDF collections for 
Apache Pig. Though Hive interface is not yet supported in DataFu, is the 
direction (to extend DataFu for Hive) a consensus in DataFu community?

My concern is that merging Hivemall codebase to DataFu makes the 
building and packing process of DataFu complex and the target/objective 
of the project unclear.

I do not think that Hivemall competes with DataFu because
1) There are users who prefer Pig and Hive respectively, and
2) Pig/DataFu is useful for what HiveQL is unsuited (e.g., complex 
feature engineering steps). After preprocessing using DataFu, Hivemall 
can be applied for classification/regression in a scalable way in Hive.

> Is there anything about Hivemall that's unique to Hive, that wouldn't be
> applicable to Pig as well?

The techniques used in Hivemall (e.g., training data amplification that 
emulates iterative training and machine learning algorithms as 
table-generating functions) could be appreciable to Apache Pig.

However, I am not a heavy user of Pig and porting Hivemall to Pig 
requires a bunch of works. So, I am currently considering to stick with 
HiveQL interfaces (Hive, HCatalog, and Tez for the software stack of 
Hivemall) in developing Hivemall because SQL-like interface is friendly 
to a broader range of developers.


Makoto YUI <>
Information Technology Research Institute, AIST.

View raw message