Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 231AAE3FB for ; Fri, 18 Jan 2013 04:58:25 +0000 (UTC) Received: (qmail 50508 invoked by uid 500); 18 Jan 2013 04:58:23 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 50366 invoked by uid 500); 18 Jan 2013 04:58:22 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 50347 invoked by uid 99); 18 Jan 2013 04:58:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jan 2013 04:58:22 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 64.78.56.61 is neither permitted nor denied by domain of rdm@baynote.com) Received: from [64.78.56.61] (HELO hub023-ca-3.exch023.serverdata.net) (64.78.56.61) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Jan 2013 04:58:17 +0000 Received: from MBX023-W1-CA-9.exch023.domain.local ([10.254.8.59]) by HUB023-CA-3.exch023.domain.local ([10.254.8.36]) with mapi id 14.02.0318.001; Thu, 17 Jan 2013 20:57:56 -0800 From: Robin Morris To: "user@hive.apache.org" Subject: Re: question about machine learning on Hive Thread-Topic: question about machine learning on Hive Thread-Index: AQHN9PkCxWq8eqCSaUmw4/SqZQnGYJhOj7GA///3HAA= Date: Fri, 18 Jan 2013 04:57:55 +0000 Message-ID: <32822BCE93377948BFFA87F763238C0F34A9CBAD@mbx023-w1-ca-9.exch023.domain.local> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [67.180.20.157] Content-Type: multipart/alternative; boundary="_000_32822BCE93377948BFFA87F763238C0F34A9CBADmbx023w1ca9exch_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_32822BCE93377948BFFA87F763238C0F34A9CBADmbx023w1ca9exch_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable In a similar way, ML algorithms can be put into a Hive UDAF. I'm working o= n this at the moment, and it's proved quite straightforward to integrate li= blinear into a UDAF. As Igor notes, by setting the number of reducers, you= can set the number of parallel learners. Robin www.baynote.com From: Igor Tatarinov > Reply-To: "user@hive.apache.org" > Date: Thursday, January 17, 2013 1:29 PM To: "user@hive.apache.org" > Subject: Re: question about machine learning on Hive Here is how Twitter does it with Pig: http://www.umiacs.umd.edu/~jimmylin/publications/Lin_Kolcz_SIGMOD2012.pdf We use a similar approach and I think that Pig, being somewhat lower-level = with better support of nested objects, is a better tool than Hive. It shoul= d be possible to do something similar with Hive but we haven't tried. The t= rick is to implement the learner as a serializer. Then, the number of reduc= ers will determine how many parallel learners (bags) you can run. igor decide.com On Thu, Jan 17, 2013 at 1:23 PM, qiaoresearcher > wrote: How to run machine learning algorithms (whatever ML algorithms) directly in= Hive? assume the input and output already stored as Hive tables. ps: I know mahout is available there, but would prefer run machine learning= algorithms directly in Hive many thanks, --_000_32822BCE93377948BFFA87F763238C0F34A9CBADmbx023w1ca9exch_ Content-Type: text/html; charset="us-ascii" Content-ID: <0E27DE0029DFE34A8EDC4355B2695953@exch023.domain.local> Content-Transfer-Encoding: quoted-printable
In a similar way, ML algorithms can be put into a Hive UDAF.  I'm= working on this at the moment, and it's proved quite straightforward to in= tegrate liblinear into a UDAF.  As Igor notes, by setting the number o= f reducers, you can set the number of parallel learners.

Robin
www.baynote.com

From: Igor Tatarinov <igor@decide.com>
Reply-To: "user@hive.apache.org" <user@hive.apache.org>
Date: Thursday, January 17, 2013 1:= 29 PM
To: "user@hive.apache.org" <user@hive.apache.org>
Subject: Re: question about machine= learning on Hive

Here is how Twitter do= es it with Pig:

We use a similar appro= ach and I think that Pig, being somewhat lower-level with better support of= nested objects, is a better tool than Hive. It should be possible to do so= mething similar with Hive but we haven't tried. The trick is to implement the learner as a serializer. Then, the nu= mber of reducers will determine how many parallel learners (bags) you can r= un.

igor



On Thu, Jan 17, 2013 at 1:23 PM, qiaoresearcher = <qiaoresea= rcher@gmail.com> wrote:

How to run machine learning algorithms (whatever ML algorithms) direct= ly in Hive? assume the input and output already stored as Hive tables. &nbs= p;

ps: I know mahout is available there, but would prefer run machine lea= rning algorithms directly in Hive

many thanks, 



--_000_32822BCE93377948BFFA87F763238C0F34A9CBADmbx023w1ca9exch_--