hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chandra Reddy Bogala <chandra.reddy2...@gmail.com>
Subject Re: merge columns and count no of records
Date Thu, 23 Jan 2014 16:24:10 GMT
awesome. Can the result of query ( count value) be saved in hive
variable(tmp variable)?. I will be running this query using "hive -f <query
file name>".


On Thu, Jan 23, 2014 at 3:59 PM, Dima Machlin <Dima.Machlin@pursway.com>wrote:

>  Counting in your case isn’t practical as the expected count can be
> greater than the amount of rows which isn’t possible using the count
> function.
>
>
>
> What you can do is multiply the amount of rows by the amount of columns.
>
> This can be done using a different table (like dual if you’d create it)
>
> Let assume you have the following table :
>
> Create table temp (num int);
>
> And it contains values from 1 to N (N rows).
>
> (You can simply create this table)
>
>
>
> Now you can do the following :
>
>
>
> select count(distinct case when temp.n=1 then t.colA else t.colB end)
>
> from table t join temp on (temp.n<=2)
>
>
>
> This way your table with be joined with two rows containing the numbers 1
> and 2
>
> So in your example this :
>
> Column A
>
> COLUMN B
>
> 1
>
> 2
>
> 2
>
> 3
>
> 5
>
> 6
>
> 4
>
> 7
>
> 1
>
> 2
>
> 4
>
> 2
>
>
>
> After the join will become :
>
> Column A
>
> COLUMN B
>
> Temp.n
>
> 1
>
> 2
>
> 1
>
> 2
>
> 3
>
> 1
>
> 5
>
> 6
>
> 1
>
> 4
>
> 7
>
> 1
>
> 1
>
> 2
>
> 1
>
> 4
>
> 2
>
> 1
>
> 1
>
> 2
>
> 2
>
> 2
>
> 3
>
> 2
>
> 5
>
> 6
>
> 2
>
> 4
>
> 7
>
> 2
>
> 1
>
> 2
>
> 2
>
> 4
>
> 2
>
> 2
>
>
>
>
>
>
>
>
>
> And after the Case you will have :
>
> Column A
>
> COLUMN B
>
> Temp.n
>
> 1
>
>
>
> 1
>
> 2
>
>
>
> 1
>
> 5
>
>
>
> 1
>
> 4
>
>
>
> 1
>
> 1
>
>
>
> 1
>
> 4
>
>
>
> 1
>
>
>
> 2
>
> 2
>
>
>
> 3
>
> 2
>
>
>
> 6
>
> 2
>
>
>
> 7
>
> 2
>
>
>
> 2
>
> 2
>
>
>
> 2
>
> 2
>
>
>
>
>
>
>
>
>
> And these columns will be merged into one and count distinct will be
> applied.
>
>
>
> *From:* Bogala, Chandra Reddy [mailto:Chandra.Bogala@gs.com]
> *Sent:* Thursday, January 09, 2014 7:17 AM
> *To:* 'user@hive.apache.org'
> *Subject:* RE: merge columns and count no of records
>
>
>
> Or Is it good idea to get data into shell variable/file and doing
> processing. Or using a pig script to do?
>
>
>
> hive -e 'select distinct(columnA), distinct(columnB)  from blah' | sed
> 's/[\t]/,/g' >/tmp/test
>
>
>
>
>
> Thanks,
>
> Chandra
>
>
>
>
>
> *From:* Bogala, Chandra Reddy [Tech]
> *Sent:* Wednesday, January 08, 2014 5:49 PM
> *To:* 'user@hive.apache.org'
> *Subject:* merge columns and count no of records
>
>
>
> Hi,
>
> My requirement is to merge ( not concat )  two columns and count number of
> distinct records. I can use self-join on column A and column B and can
> count number of records.
>
> But  looks not optimal way of doing. Is there any better way to do.
>
>
>
> Ex: Original table
>
> Column A
>
> COLUMN B
>
> 1
>
> 2
>
> 2
>
> 3
>
> 5
>
> 6
>
> 4
>
> 7
>
> 1
>
> 2
>
> 4
>
> 2
>
>
>
> Logic something like this: Count(Distinct(Merge (distinct(A),distinct(B))))
>
> Query OUTPUT should be :7
>
> Values {1,2,3,4,5,6,7}
>
>
>
> Thanks,
>
> Chandra
>
>
>
>
>
>
> ************************************************************************************
> This footnote confirms that this email message has been scanned by
> PineApp Mail-SeCure for the presence of malicious code, vandals & computer
> viruses.
>
> ************************************************************************************
>
> =
>

Mime
View raw message