hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Morris <>
Subject Re: question about machine learning on Hive
Date Fri, 18 Jan 2013 04:57:55 GMT
In a similar way, ML algorithms can be put into a Hive UDAF.  I'm working on this at the moment,
and it's proved quite straightforward to integrate liblinear into a UDAF.  As Igor notes,
by setting the number of reducers, you can set the number of parallel learners.


From: Igor Tatarinov <<>>
Reply-To: "<>" <<>>
Date: Thursday, January 17, 2013 1:29 PM
To: "<>" <<>>
Subject: Re: question about machine learning on Hive

Here is how Twitter does it with Pig:

We use a similar approach and I think that Pig, being somewhat lower-level with better support
of nested objects, is a better tool than Hive. It should be possible to do something similar
with Hive but we haven't tried. The trick is to implement the learner as a serializer. Then,
the number of reducers will determine how many parallel learners (bags) you can run.


On Thu, Jan 17, 2013 at 1:23 PM, qiaoresearcher <<>>

How to run machine learning algorithms (whatever ML algorithms) directly in Hive? assume the
input and output already stored as Hive tables.

ps: I know mahout is available there, but would prefer run machine learning algorithms directly
in Hive

many thanks,

View raw message