hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Makoto YUI <>
Subject [ANN] Hivemall: Hive scalable machine learning library
Date Wed, 02 Oct 2013 08:21:36 GMT
Hello all,

My employer, AIST, has given the thumbs up to open source our machine
learning library, named Hivemall.

Hivemall is a scalable machine learning library running on Hive/Hadoop,
licensed under the LGPL 2.1.

Hivemall provides machine learning functionality as well as feature
engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed
to be scalable to the number of training instances as well as the number
of training features.

Hivemall is very easy to use as every machine learning step is done
within HiveQL.

-- Installation is just as follows:
add jar /tmp/hivemall.jar;
source /tmp/define-all.hive;

-- Logistic regression is performed by a query.
  avg(weight) as weight
 (SELECT logress(features,label) as (feature,weight) FROM
training_features) t
GROUP BY feature;

You can find detailed examples on our wiki pages.

Though we consider that Hivemall is much easier to use and more scalable
than Mahout for classification/regression tasks, please check it by
yourself. If you have a Hive environment, you can evaluate Hivemall
within 5 minutes or so.

Hope you enjoy the release! Feedback (and pull request) is always welcome.

Thank you,

View raw message