hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Nanda <saurabhna...@gmail.com>
Subject Re: counting different regexes in a single pass
Date Mon, 27 Jul 2009 18:59:14 GMT
I think you can do that with regex replace and GROUP BY. Something
like this- select replaced_col, count(1) from (select
regex_replace(original_col, '.*(text1|text2|text3).*', '$1') as
replaced_col from table)

On 7/28/09, Andraz Tori <andraz@zemanta.com> wrote:
> Hi,
> I am a beginner at Hive's SQL so I am sorry if this question is answered
> somewhere else. I tried to find the answer in Wiki, but no luck.
>
> I have a dataset in which one of the columns is text. I need to count
> number of records that match certain regex on that column. There are a
> number of different regexes that I need to count records for and it
> seems that there should be a way to do it in one pass through the data
> with hive.
>
> So how do I do it?
>
> This doesn't seem to work (on Hive 0.3):
> select sum(textcolumn LIKE "%%sometext%%") as CountA, sum(textcolumn
> LIKE "%%othertext%%") as CountB from sometable;
>
>
> Thank you for your help in advance!
>
>
> [unrelated: We've written a simple AWS S3 log format deserializer for
> Hive. It needs some polishing - if anyone needs it or wants to polish it
> up for inclusion, let me know]
>
> --
> Andraz Tori, CTO
> Zemanta Ltd, New York, London, Ljubljana
> www.zemanta.com
> mail: andraz@zemanta.com
> tel: +386 41 515 767
> ´╗┐twitter: andraz, skype: minmax_test
>
>
>
>


-- 
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Mime
View raw message