hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Makoto YUI <yuin...@gmail.com>
Subject Re: [ANN] Hivemall: Hive scalable machine learning library
Date Fri, 04 Oct 2013 16:42:19 GMT
Hi Edward,

Thank you for your interst.

Hivemall project does not have a plan to have a specific mailing list, I 
will answer following questions/comments on twitter or through Github 
issues (with a question label).

BTW, I just added a CTR (Click-Through-Rate) prediction example that is
provided by a commercial search engine provider for the KDDCup 2012 
track 2.
https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-dataset

I guess many of you working on ad CTR/CVR predictions. This example 
might be some help understanding how to do it only within Hive.

Thanks,
Makoto @myui

(2013/10/04 23:02), Edward Capriolo wrote:
> Looks cool im already starting to play with it.
>
> On Friday, October 4, 2013, Makoto Yui <yuin405@gmail.com
> <mailto:yuin405@gmail.com>> wrote:
>  > Hi Dean,
>  >
>  > Thank you for your interest in Hivemall.
>  >
>  > Twitter's paper actually influenced me in developing Hivemall and I
>  > initially implemented such functionality as Pig UDFs.
>  >
>  > Though my Pig ML library is not released, you can find a similar
>  > attempt for Pig in
>  > https://github.com/y-tag/java-pig-MyUDFs
>  >
>  > Thanks,
>  > Makoto
>  >
>  > 2013/10/3 Dean Wampler <deanwampler@gmail.com
> <mailto:deanwampler@gmail.com>>:
>  >> This is great news! I know that Twitter has done something similar
> with UDFs
>  >> for Pig, as described in this paper:
>  >>
> http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf <http://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf>
>  >>
>  >> I'm glad to see the same thing start with Hive.
>  >>
>  >> Dean
>  >>
>  >>
>  >> On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI <yuin405@gmail.com
> <mailto:yuin405@gmail.com>> wrote:
>  >>>
>  >>> Hello all,
>  >>>
>  >>> My employer, AIST, has given the thumbs up to open source our machine
>  >>> learning library, named Hivemall.
>  >>>
>  >>> Hivemall is a scalable machine learning library running on Hive/Hadoop,
>  >>> licensed under the LGPL 2.1.
>  >>>
>  >>> https://github.com/myui/hivemall
>  >>>
>  >>> Hivemall provides machine learning functionality as well as feature
>  >>> engineering functions through UDFs/UDAFs/UDTFs of Hive. It is designed
>  >>> to be scalable to the number of training instances as well as the
> number
>  >>> of training features.
>  >>>
>  >>> Hivemall is very easy to use as every machine learning step is done
>  >>> within HiveQL.
>  >>>
>  >>> -- Installation is just as follows:
>  >>> add jar /tmp/hivemall.jar;
>  >>> source /tmp/define-all.hive;
>  >>>
>  >>> -- Logistic regression is performed by a query.
>  >>> SELECT
>  >>>   feature,
>  >>>   avg(weight) as weight
>  >>> FROM
>  >>>  (SELECT logress(features,label) as (feature,weight) FROM
>  >>> training_features) t
>  >>> GROUP BY feature;
>  >>>
>  >>> You can find detailed examples on our wiki pages.
>  >>> https://github.com/myui/hivemall/wiki/_pages
>  >>>
>  >>> Though we consider that Hivemall is much easier to use and more
> scalable
>  >>> than Mahout for classification/regression tasks, please check it by
>  >>> yourself. If you have a Hive environment, you can evaluate Hivemall
>  >>> within 5 minutes or so.
>  >>>
>  >>> Hope you enjoy the release! Feedback (and pull request) is always
> welcome.
>  >>>
>  >>> Thank you,
>  >>> Makoto
>  >>
>  >>
>  >>
>  >>
>  >> --
>  >> Dean Wampler, Ph.D.
>  >> @deanwampler
>  >> http://polyglotprogramming.com
>  >


Mime
View raw message