hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Makoto YUI <yuin...@gmail.com>
Subject Re: [ANN] Hivemall: Hive scalable machine learning library
Date Fri, 11 Oct 2013 07:28:19 GMT
Hi,

I added support for the-state-of-the-art classifiers (those are not yet 
supported in Mahout) and Hivemall's cute(!?) logo as well in Hivemall 
0.1-rc3.

Newly supported classifiers include
- Confidence Weighted (CW)
- Adaptive Regularization of Weight Vectors (AROW)
- Soft Confidence Weighted (SCW1, SCW2)

Those classifiers are much smart comparing to the standard SGD-based or 
passive aggressive classifiers. Please check it out by yourself.

Thanks,
Makoto

(2013/10/11 4:28), Clark Yang (杨卓荦) wrote:
> I looks really cool, I think I will try it on.
>
> Cheers,
> Zhuoluo (Clark) Yang
>
>
> 2013/10/5 Makoto YUI <yuin405@gmail.com <mailto:yuin405@gmail.com>>
>
>     Hi Edward,
>
>     Thank you for your interst.
>
>     Hivemall project does not have a plan to have a specific mailing
>     list, I will answer following questions/comments on twitter or
>     through Github issues (with a question label).
>
>     BTW, I just added a CTR (Click-Through-Rate) prediction example that is
>     provided by a commercial search engine provider for the KDDCup 2012
>     track 2.
>     https://github.com/myui/__hivemall/wiki/KDDCup-2012-__track-2-CTR-prediction-dataset
>     <https://github.com/myui/hivemall/wiki/KDDCup-2012-track-2-CTR-prediction-dataset>
>
>     I guess many of you working on ad CTR/CVR predictions. This example
>     might be some help understanding how to do it only within Hive.
>
>     Thanks,
>     Makoto @myui
>
>
>     (2013/10/04 23:02), Edward Capriolo wrote:
>
>         Looks cool im already starting to play with it.
>
>         On Friday, October 4, 2013, Makoto Yui <yuin405@gmail.com
>         <mailto:yuin405@gmail.com>
>         <mailto:yuin405@gmail.com <mailto:yuin405@gmail.com>>> wrote:
>           > Hi Dean,
>           >
>           > Thank you for your interest in Hivemall.
>           >
>           > Twitter's paper actually influenced me in developing
>         Hivemall and I
>           > initially implemented such functionality as Pig UDFs.
>           >
>           > Though my Pig ML library is not released, you can find a similar
>           > attempt for Pig in
>           > https://github.com/y-tag/java-__pig-MyUDFs
>         <https://github.com/y-tag/java-pig-MyUDFs>
>           >
>           > Thanks,
>           > Makoto
>           >
>           > 2013/10/3 Dean Wampler <deanwampler@gmail.com
>         <mailto:deanwampler@gmail.com>
>         <mailto:deanwampler@gmail.com <mailto:deanwampler@gmail.com>>__>:
>
>           >> This is great news! I know that Twitter has done something
>         similar
>         with UDFs
>           >> for Pig, as described in this paper:
>           >>
>         http://www.umiacs.umd.edu/~__jimmylin/publications/Lin___Kolcz_SIGMOD2012.pdf
>         <http://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf>
>         <http://www.umiacs.umd.edu/%__7Ejimmylin/publications/Lin___Kolcz_SIGMOD2012.pdf
>         <http://www.umiacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf>>
>
>           >>
>           >> I'm glad to see the same thing start with Hive.
>           >>
>           >> Dean
>           >>
>           >>
>           >> On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI
>         <yuin405@gmail.com <mailto:yuin405@gmail.com>
>         <mailto:yuin405@gmail.com <mailto:yuin405@gmail.com>>> wrote:
>           >>>
>           >>> Hello all,
>           >>>
>           >>> My employer, AIST, has given the thumbs up to open source
>         our machine
>           >>> learning library, named Hivemall.
>           >>>
>           >>> Hivemall is a scalable machine learning library running on
>         Hive/Hadoop,
>           >>> licensed under the LGPL 2.1.
>           >>>
>           >>> https://github.com/myui/__hivemall
>         <https://github.com/myui/hivemall>
>           >>>
>           >>> Hivemall provides machine learning functionality as well
>         as feature
>           >>> engineering functions through UDFs/UDAFs/UDTFs of Hive. It
>         is designed
>           >>> to be scalable to the number of training instances as well
>         as the
>         number
>           >>> of training features.
>           >>>
>           >>> Hivemall is very easy to use as every machine learning
>         step is done
>           >>> within HiveQL.
>           >>>
>           >>> -- Installation is just as follows:
>           >>> add jar /tmp/hivemall.jar;
>           >>> source /tmp/define-all.hive;
>           >>>
>           >>> -- Logistic regression is performed by a query.
>           >>> SELECT
>           >>>   feature,
>           >>>   avg(weight) as weight
>           >>> FROM
>           >>>  (SELECT logress(features,label) as (feature,weight) FROM
>           >>> training_features) t
>           >>> GROUP BY feature;
>           >>>
>           >>> You can find detailed examples on our wiki pages.
>           >>> https://github.com/myui/__hivemall/wiki/_pages
>         <https://github.com/myui/hivemall/wiki/_pages>
>           >>>
>           >>> Though we consider that Hivemall is much easier to use and
>         more
>         scalable
>           >>> than Mahout for classification/regression tasks, please
>         check it by
>           >>> yourself. If you have a Hive environment, you can evaluate
>         Hivemall
>           >>> within 5 minutes or so.
>           >>>
>           >>> Hope you enjoy the release! Feedback (and pull request) is
>         always
>         welcome.
>           >>>
>           >>> Thank you,
>           >>> Makoto
>           >>
>           >>
>           >>
>           >>
>           >> --
>           >> Dean Wampler, Ph.D.
>           >> @deanwampler
>           >> http://polyglotprogramming.com
>           >
>
>
>


Mime
View raw message