Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 28459 invoked from network); 10 Oct 2009 18:49:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Oct 2009 18:49:30 -0000 Received: (qmail 18134 invoked by uid 500); 10 Oct 2009 18:49:30 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 18086 invoked by uid 500); 10 Oct 2009 18:49:30 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 18069 invoked by uid 99); 10 Oct 2009 18:49:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Oct 2009 18:49:29 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of zshao9@gmail.com designates 209.85.211.187 as permitted sender) Received: from [209.85.211.187] (HELO mail-yw0-f187.google.com) (209.85.211.187) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Oct 2009 18:49:18 +0000 Received: by ywh17 with SMTP id 17so4444434ywh.2 for ; Sat, 10 Oct 2009 11:47:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=QMb1mllxcNp80tz70bT7/UzBNvsGb5DsDZsvcUAZWEU=; b=k19LLFHRUmyG8EyZQRwZTkCKHHEVUDtWPl1s6vQtbLaA1FZLapnGq44uQ5NAbjhKYM 7Reb+25kgYg84QCwpIKAjl0zvGgOR0hRoOD44mt1+WpkUqEP3lFRvlWl3fJQWBqCY5YY 2K4ul8i+q/hpl1lptAirKrsrRJVzNbcUk+DqY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=lhn1Xy0zQB5ji0dzLx8rVRW+MxT5l8WZoqtS3uvVd8Hhw70U97s/aBvEUTmMCdSw2w tIzh5ZplbcXBjiNCy2ErrtZbLxFa4hY1YefTyg9I0U++CzOUb+ZKKd9wopjdXrz2eGx3 vm3kmXSvC/hsyEOb0rKVWm8/OQPd50C3RoS9k= MIME-Version: 1.0 Received: by 10.101.81.12 with SMTP id i12mr4548678anl.151.1255200477245; Sat, 10 Oct 2009 11:47:57 -0700 (PDT) In-Reply-To: <6acdaf2a0910101142k568b0be4t2e2a505ab8f8ac4b@mail.gmail.com> References: <6acdaf2a0910101142k568b0be4t2e2a505ab8f8ac4b@mail.gmail.com> Date: Sat, 10 Oct 2009 11:47:57 -0700 Message-ID: <34fd060d0910101147s643f1a1am59573ef8a4fdac51@mail.gmail.com> Subject: Re: Multiple aggregated metrics in one query From: Zheng Shao To: hive-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001636ed65f7916fe50475992296 X-Virus-Checked: Checked by ClamAV on apache.org --001636ed65f7916fe50475992296 Content-Type: text/plain; charset=ISO-8859-1 Yes, we can do this: SELECT ip, SUM(IF(action = 'action1', 1, 0)), SUM(IF(action = 'action2', 1, 0)), SUM(IF(action = 'action3', 1, 0)) FROM mytable GROUP BY ip; For more details on IF, please refer to: http://dev.mysql.com/doc/refman/5.0/en/control-flow-functions.html#function_if Zheng On Sat, Oct 10, 2009 at 11:42 AM, Ryan LeCompte wrote: > Hello all, > > Very newto Hive (haven't even installed it yet!), but I had a use case that > I didn't see demonstrated in any of the tutorial/documentation that I've > read thus far. > > Let's say that I have apache logs that I want to process with Hadoop/Hive. > Of course there may be different types of log records all tying back to the > same user or IP address or other log attribute. Is there a way to submit a > SINGLE Hive query to get back results that may look like: > > > IP Action1Count Action2Count Action3Count > > .. where the different actions correspond to different log events for that > IP address. > > Do I have to submit 3 different Hive queries here or can I submit a single > Hive query? In a regular Java-based map/reduce job, I would have written a > custom Writable that would record counts for each of the different actions, > and submit it to the reducer using output.collect(IP, customWritable). Here > I wouldn't have to submit multiple map/reduce jobs, just 1. > > Thanks > Ryan > > -- Yours, Zheng --001636ed65f7916fe50475992296 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Yes, we can do this:

SELECT ip, SUM(IF(action =3D 'action1',= 1, 0)), SUM(IF(action =3D 'action2', 1, 0)), SUM(IF(action =3D = 9;action3', 1, 0))
FROM mytable
GROUP BY ip;

For more deta= ils on IF, please refer to: http://dev.mysql.com/doc/refman= /5.0/en/control-flow-functions.html#function_if

Zheng

On Sat, Oct 10, 2009 at 11:42 A= M, Ryan LeCompte <lecompte@gmail.com> wrote:
Hello all,

Very newto Hive (haven't even installed it yet!), but= I had a use case that I didn't see demonstrated in any of the tutorial= /documentation that I've read thus far.

Let's say that I hav= e apache logs that I want to process with Hadoop/Hive. Of course there may = be different types of log records all tying back to the same user or IP add= ress or other log attribute. Is there a way to submit a SINGLE Hive query t= o get back results that may look like:


IP Action1Count Action2Count Action3Count

.. where the diffe= rent actions correspond to different log events for that IP address.
Do I have to submit 3 different Hive queries here or can I submit a single= Hive query? In a regular Java-based map/reduce job, I would have written a= custom Writable that would record counts for each of the different actions= , and submit it to the reducer using output.collect(IP, customWritable). He= re I wouldn't have to submit multiple map/reduce jobs, just 1.

Thanks
Ryan




--
Yours,
Zheng<= br> --001636ed65f7916fe50475992296--