Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5C939109F5 for ; Fri, 11 Oct 2013 09:28:51 +0000 (UTC) Received: (qmail 56898 invoked by uid 500); 11 Oct 2013 09:28:49 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 56852 invoked by uid 500); 11 Oct 2013 09:28:48 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 56844 invoked by uid 99); 11 Oct 2013 09:28:47 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Oct 2013 09:28:47 +0000 X-ASF-Spam-Status: No, hits=2.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nitinpawar432@gmail.com designates 209.85.216.47 as permitted sender) Received: from [209.85.216.47] (HELO mail-qa0-f47.google.com) (209.85.216.47) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Oct 2013 09:28:43 +0000 Received: by mail-qa0-f47.google.com with SMTP id k15so274813qaq.13 for ; Fri, 11 Oct 2013 02:28:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=0aK639NxxJve2PeIBUpSHcMLQoqGUqiijYsahVmZoSs=; b=TGJ2Pp4Fn5Drk3UrQsdwBOC21fJWA4uNcU0GZYIrqBt6RXGiVwsePZ/PpEDVxUFOuR A2Guk6pIU9I/gyCDDlVQZ+F7WtSrLMUpZ4KnsXNTCpmF01kSqteq/VL2XUUHO2Q9OXG5 vCiuKUJuBsGEbkhaoOSzi1KsW93jovST3nE7PbUjGEuuQC4701fHFNaI3PZZ0bxHePO1 sZqSiKhLu+Qr6jyekAd6WIadBnpa3OEeL8EsWg7YnxFe+weXir0kdX3k/JzD4DM19bmW FsRVtFtEryNjaMSL1kqAo9RylE5zw4HhGhul8AIIMXTaE61JGcEYDxCbkSlFCFq5Z40F Ca1g== MIME-Version: 1.0 X-Received: by 10.224.7.7 with SMTP id b7mr15389766qab.12.1381483702820; Fri, 11 Oct 2013 02:28:22 -0700 (PDT) Received: by 10.224.193.136 with HTTP; Fri, 11 Oct 2013 02:28:22 -0700 (PDT) In-Reply-To: <5257A893.8070405@gmail.com> References: <524BD790.8050504@gmail.com> <524EEFEB.8010308@gmail.com> <5257A893.8070405@gmail.com> Date: Fri, 11 Oct 2013 14:58:22 +0530 Message-ID: Subject: Re: [ANN] Hivemall: Hive scalable machine learning library From: Nitin Pawar To: "user@hive.apache.org" Content-Type: multipart/alternative; boundary=001a11c286885f116a04e873bd02 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c286885f116a04e873bd02 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Just tried this for some hot trends in forum managements. Was pretty impressive. I will try this more deeply and if possible integrate in my product. Thanks for the awesome work. Nitin On Fri, Oct 11, 2013 at 12:58 PM, Makoto YUI wrote: > Hi, > > I added support for the-state-of-the-art classifiers (those are not yet > supported in Mahout) and Hivemall's cute(!?) logo as well in Hivemall > 0.1-rc3. > > Newly supported classifiers include > - Confidence Weighted (CW) > - Adaptive Regularization of Weight Vectors (AROW) > - Soft Confidence Weighted (SCW1, SCW2) > > Those classifiers are much smart comparing to the standard SGD-based or > passive aggressive classifiers. Please check it out by yourself. > > Thanks, > Makoto > > > (2013/10/11 4:28), Clark Yang (=E6=9D=A8=E5=8D=93=E8=8D=A6) wrote: > >> I looks really cool, I think I will try it on. >> >> Cheers, >> Zhuoluo (Clark) Yang >> >> >> 2013/10/5 Makoto YUI > >> >> >> Hi Edward, >> >> Thank you for your interst. >> >> Hivemall project does not have a plan to have a specific mailing >> list, I will answer following questions/comments on twitter or >> through Github issues (with a question label). >> >> BTW, I just added a CTR (Click-Through-Rate) prediction example that >> is >> provided by a commercial search engine provider for the KDDCup 2012 >> track 2. >> https://github.com/myui/__**hivemall/wiki/KDDCup-2012-__** >> track-2-CTR-prediction-dataset >> >> > track-2-CTR-prediction-dataset >> **> >> >> I guess many of you working on ad CTR/CVR predictions. This example >> might be some help understanding how to do it only within Hive. >> >> Thanks, >> Makoto @myui >> >> >> (2013/10/04 23:02), Edward Capriolo wrote: >> >> Looks cool im already starting to play with it. >> >> On Friday, October 4, 2013, Makoto Yui > >> >> wrote: >> > Hi Dean, >> > >> > Thank you for your interest in Hivemall. >> > >> > Twitter's paper actually influenced me in developing >> Hivemall and I >> > initially implemented such functionality as Pig UDFs. >> > >> > Though my Pig ML library is not released, you can find a >> similar >> > attempt for Pig in >> > https://github.com/y-tag/java-**__pig-MyUDFs >> >> >> > >> > >> > Thanks, >> > Makoto >> > >> > 2013/10/3 Dean Wampler > >> ** >> >__>: >> >> >> >> This is great news! I know that Twitter has done something >> similar >> with UDFs >> >> for Pig, as described in this paper: >> >> >> http://www.umiacs.umd.edu/~__**jimmylin/publications/Lin___** >> Kolcz_SIGMOD2012.pdf >> > Kolcz_SIGMOD2012.pdf >> > >> > Kolcz_SIGMOD2012.pdf >> >> > Kolcz_SIGMOD2012.pdf >> >> >> >> >> >> >> I'm glad to see the same thing start with Hive. >> >> >> >> Dean >> >> >> >> >> >> On Wed, Oct 2, 2013 at 10:21 AM, Makoto YUI >> >> >> wrote: >> >>> >> >>> Hello all, >> >>> >> >>> My employer, AIST, has given the thumbs up to open source >> our machine >> >>> learning library, named Hivemall. >> >>> >> >>> Hivemall is a scalable machine learning library running on >> Hive/Hadoop, >> >>> licensed under the LGPL 2.1. >> >>> >> >>> https://github.com/myui/__**hivemall >> >> >> > >> >>> >> >>> Hivemall provides machine learning functionality as well >> as feature >> >>> engineering functions through UDFs/UDAFs/UDTFs of Hive. It >> is designed >> >>> to be scalable to the number of training instances as well >> as the >> number >> >>> of training features. >> >>> >> >>> Hivemall is very easy to use as every machine learning >> step is done >> >>> within HiveQL. >> >>> >> >>> -- Installation is just as follows: >> >>> add jar /tmp/hivemall.jar; >> >>> source /tmp/define-all.hive; >> >>> >> >>> -- Logistic regression is performed by a query. >> >>> SELECT >> >>> feature, >> >>> avg(weight) as weight >> >>> FROM >> >>> (SELECT logress(features,label) as (feature,weight) FROM >> >>> training_features) t >> >>> GROUP BY feature; >> >>> >> >>> You can find detailed examples on our wiki pages. >> >>> https://github.com/myui/__**hivemall/wiki/_pages >> >> >> > >> >>> >> >>> Though we consider that Hivemall is much easier to use and >> more >> scalable >> >>> than Mahout for classification/regression tasks, please >> check it by >> >>> yourself. If you have a Hive environment, you can evaluate >> Hivemall >> >>> within 5 minutes or so. >> >>> >> >>> Hope you enjoy the release! Feedback (and pull request) is >> always >> welcome. >> >>> >> >>> Thank you, >> >>> Makoto >> >> >> >> >> >> >> >> >> >> -- >> >> Dean Wampler, Ph.D. >> >> @deanwampler >> >> http://polyglotprogramming.com >> > >> >> >> >> > --=20 Nitin Pawar --001a11c286885f116a04e873bd02 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Just tried this for some hot trends in forum managements. = Was pretty impressive.=C2=A0

I will try this more deeply= and if possible integrate in my product.=C2=A0

Th= anks for the awesome work.=C2=A0

Nitin=C2=A0

<= br>
On Fri, Oct 11, 2013 at 12:58 PM, Makoto YUI = <yuin405@gmail.com> wrote:
Hi,

I added support for the-state-of-the-art classifiers (those are not yet sup= ported in Mahout) and Hivemall's cute(!?) logo as well in Hivemall 0.1-= rc3.

Newly supported classifiers include
- Confidence Weighted (CW)
- Adaptive Regularization of Weight Vectors (AROW)
- Soft Confidence Weighted (SCW1, SCW2)

Those classifiers are much smart comparing to the standard SGD-based or pas= sive aggressive classifiers. Please check it out by yourself.

Thanks,
Makoto


(2013/10/11 4:28), Clark Yang (=E6=9D=A8=E5=8D=93=E8=8D=A6) wrote:
I looks really cool, I think I will try it on.

Cheers,
Zhuoluo (Clark) Yang


2013/10/5 Makoto YUI <yuin405@gmail.com <mailto:yuin405@gmail.com>>


=C2=A0 =C2=A0 Hi Edward,

=C2=A0 =C2=A0 Thank you for your interst.

=C2=A0 =C2=A0 Hivemall project does not have a plan to have a specific mail= ing
=C2=A0 =C2=A0 list, I will answer following questions/comments on twitter o= r
=C2=A0 =C2=A0 through Github issues (with a question label).

=C2=A0 =C2=A0 BTW, I just added a CTR (Click-Through-Rate) prediction examp= le that is
=C2=A0 =C2=A0 provided by a commercial search engine provider for the KDDCu= p 2012
=C2=A0 =C2=A0 track 2.
=C2=A0 =C2=A0 https://github.com/my= ui/__hivemall/wiki/KDDCup-2012-__track-2-CTR-prediction-datas= et
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <mailto:yuin405@gmail.com <mailto:yuin405@gmail.com>>> wrote: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > Hi Dean,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > Thank you for your interest in Hive= mall.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > Twitter's paper actually influe= nced me in developing
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Hivemall and I
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > initially implemented such function= ality as Pig UDFs.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > Though my Pig ML library is not rel= eased, you can find a similar
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > attempt for Pig in
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > https://github.com/y-tag/java-= __pig-MyUDFs

=C2=A0 =C2=A0 =C2=A0 =C2=A0 <https://github.com/y-tag/java-pig-MyUDFs= >
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > Thanks,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > Makoto
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 > 2013/10/3 Dean Wampler <deanwampler@gmail.com
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <mailto:
deanwampler@gmail.com>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <mailto:deanwampler@gmail.com <mailto:deanwampler@gmail.com>>__>:


=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> This is great news! I know that= Twitter has done something
=C2=A0 =C2=A0 =C2=A0 =C2=A0 similar
=C2=A0 =C2=A0 =C2=A0 =C2=A0 with UDFs
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> for Pig, as described in this p= aper:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 http://www.umi= acs.umd.edu/~__jimmylin/publications/Lin___Kolcz_SIGMOD2012.p= df
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <http://www.u= miacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.p= df>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <http://w= ww.umiacs.umd.edu/%__7Ejimmylin/publications/Lin___Kolcz_SIGM= OD2012.pdf

=C2=A0 =C2=A0 =C2=A0 =C2=A0 <http://www.u= miacs.umd.edu/%7Ejimmylin/publications/Lin_Kolcz_SIGMOD2012.p= df>>

=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> I'm glad to see the same th= ing start with Hive.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> Dean
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> On Wed, Oct 2, 2013 at 10:21 AM= , Makoto YUI
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <yuin405@gmail.com <mailto:yuin405@gmail.com>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 <mailto:yuin405@gmail.com <mailto:yuin405@gmail.com>>> wrote: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> Hello all,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> My employer, AIST, has give= n the thumbs up to open source
=C2=A0 =C2=A0 =C2=A0 =C2=A0 our machine
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> learning library, named Hiv= emall.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> Hivemall is a scalable mach= ine learning library running on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Hive/Hadoop,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> licensed under the LGPL 2.1= .
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> https://github.com/myui/__hive= mall

=C2=A0 =C2=A0 =C2=A0 =C2=A0 <https://github.com/myui/hivemall>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> Hivemall provides machine l= earning functionality as well
=C2=A0 =C2=A0 =C2=A0 =C2=A0 as feature
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> engineering functions throu= gh UDFs/UDAFs/UDTFs of Hive. It
=C2=A0 =C2=A0 =C2=A0 =C2=A0 is designed
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> to be scalable to the numbe= r of training instances as well
=C2=A0 =C2=A0 =C2=A0 =C2=A0 as the
=C2=A0 =C2=A0 =C2=A0 =C2=A0 number
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> of training features.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> Hivemall is very easy to us= e as every machine learning
=C2=A0 =C2=A0 =C2=A0 =C2=A0 step is done
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> within HiveQL.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> -- Installation is just as = follows:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> add jar /tmp/hivemall.jar;<= br> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> source /tmp/define-all.hive= ;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> -- Logistic regression is p= erformed by a query.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> SELECT
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> =C2=A0 feature,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> =C2=A0 avg(weight) as weigh= t
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> FROM
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> =C2=A0(SELECT logress(featu= res,label) as (feature,weight) FROM
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> training_features) t
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> GROUP BY feature;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> You can find detailed examp= les on our wiki pages.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> https://github.com/myui/_= _hivemall/wiki/_pages

=C2=A0 =C2=A0 =C2=A0 =C2=A0 <https://github.com/myui/hivemall/wik= i/_pages>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> Though we consider that Hiv= emall is much easier to use and
=C2=A0 =C2=A0 =C2=A0 =C2=A0 more
=C2=A0 =C2=A0 =C2=A0 =C2=A0 scalable
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> than Mahout for classificat= ion/regression tasks, please
=C2=A0 =C2=A0 =C2=A0 =C2=A0 check it by
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> yourself. If you have a Hiv= e environment, you can evaluate
=C2=A0 =C2=A0 =C2=A0 =C2=A0 Hivemall
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> within 5 minutes or so.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> Hope you enjoy the release!= Feedback (and pull request) is
=C2=A0 =C2=A0 =C2=A0 =C2=A0 always
=C2=A0 =C2=A0 =C2=A0 =C2=A0 welcome.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> Thank you,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>> Makoto
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> --
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> Dean Wampler, Ph.D.
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> @deanwampler
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >> http://polyglotprogramming.com
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 >







--
Nitin Pawar<= br>
--001a11c286885f116a04e873bd02--