Return-Path: Delivered-To: apmail-hadoop-hive-user-archive@minotaur.apache.org Received: (qmail 89136 invoked from network); 27 Jul 2009 18:58:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Jul 2009 18:58:42 -0000 Received: (qmail 23018 invoked by uid 500); 27 Jul 2009 18:59:47 -0000 Delivered-To: apmail-hadoop-hive-user-archive@hadoop.apache.org Received: (qmail 22977 invoked by uid 500); 27 Jul 2009 18:59:47 -0000 Mailing-List: contact hive-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hive-user@hadoop.apache.org Delivered-To: mailing list hive-user@hadoop.apache.org Received: (qmail 22968 invoked by uid 99); 27 Jul 2009 18:59:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jul 2009 18:59:47 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of saurabhnanda@gmail.com designates 209.85.221.188 as permitted sender) Received: from [209.85.221.188] (HELO mail-qy0-f188.google.com) (209.85.221.188) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jul 2009 18:59:36 +0000 Received: by qyk26 with SMTP id 26so4179114qyk.5 for ; Mon, 27 Jul 2009 11:59:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=e7A1WgSr05U6RbQE79UepoEBsarylnlVRSj9vpB24sg=; b=L1ucpn1OItco71bNMFPUzMF2BpqSrLrRUwRH+oUksh5IJv+N2yRzKneAe6nFKneXHB rEF2NN38t/SQyHbP43+mkyGK6zOSN4FtxOORrtK5oowcArlCIqANnrY6rX63EyhBX2F7 HNsEyq825I9vH70aRYDf0PZ4ASL5mmPJy8ya0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=nNh6Av5khEN89M76Ec4UX/MiCG505iBZevjIuqEEyl+X3OHGcinffVjYS8o8vC7A6V G48ykOpCacj8K04ZkMZ7VnmUxYZ4HtuMyTrBw3uBSR5MwRcIcteYs1IpP+zYIrtuDw+A B3Z9OXHgPMX1+R0qVBxRaCQYnvsCbBnqWxJWs= MIME-Version: 1.0 Received: by 10.220.92.14 with SMTP id p14mr3873210vcm.92.1248721154605; Mon, 27 Jul 2009 11:59:14 -0700 (PDT) In-Reply-To: <1248719723.6541.113.camel@minmax-laptop> References: <1248719723.6541.113.camel@minmax-laptop> Date: Tue, 28 Jul 2009 00:29:14 +0530 Message-ID: <794f042d0907271159j7f8c0335j132eb23985773ee9@mail.gmail.com> Subject: Re: counting different regexes in a single pass From: Saurabh Nanda To: hive-user@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I think you can do that with regex replace and GROUP BY. Something like this- select replaced_col, count(1) from (select regex_replace(original_col, '.*(text1|text2|text3).*', '$1') as replaced_col from table) On 7/28/09, Andraz Tori wrote: > Hi, > I am a beginner at Hive's SQL so I am sorry if this question is answered > somewhere else. I tried to find the answer in Wiki, but no luck. > > I have a dataset in which one of the columns is text. I need to count > number of records that match certain regex on that column. There are a > number of different regexes that I need to count records for and it > seems that there should be a way to do it in one pass through the data > with hive. > > So how do I do it? > > This doesn't seem to work (on Hive 0.3): > select sum(textcolumn LIKE "%%sometext%%") as CountA, sum(textcolumn > LIKE "%%othertext%%") as CountB from sometable; > > > Thank you for your help in advance! > > > [unrelated: We've written a simple AWS S3 log format deserializer for > Hive. It needs some polishing - if anyone needs it or wants to polish it > up for inclusion, let me know] > > -- > Andraz Tori, CTO > Zemanta Ltd, New York, London, Ljubljana > www.zemanta.com > mail: andraz@zemanta.com > tel: +386 41 515 767 > =EF=BB=BFtwitter: andraz, skype: minmax_test > > > > --=20 http://nandz.blogspot.com http://foodieforlife.blogspot.com