hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dima Machlin <Dima.Mach...@pursway.com>
Subject RE: merge columns and count no of records
Date Thu, 23 Jan 2014 10:29:07 GMT
Counting in your case isn't practical as the expected count can be greater than the amount
of rows which isn't possible using the count function.

What you can do is multiply the amount of rows by the amount of columns.
This can be done using a different table (like dual if you'd create it)
Let assume you have the following table :
Create table temp (num int);
And it contains values from 1 to N (N rows).
(You can simply create this table)

Now you can do the following :

select count(distinct case when temp.n=1 then t.colA else t.colB end)
from table t join temp on (temp.n<=2)

This way your table with be joined with two rows containing the numbers 1 and 2
So in your example this :
Column A

COLUMN B

1

2

2

3

5

6

4

7

1

2

4

2


After the join will become :
Column A

COLUMN B

Temp.n

1

2

1

2

3

1

5

6

1

4

7

1

1

2

1

4

2

1

1

2

2

2

3

2

5

6

2

4

7

2

1

2

2

4

2

2








And after the Case you will have :
Column A

COLUMN B

Temp.n

1



1

2



1

5



1

4



1

1



1

4



1



2

2



3

2



6

2



7

2



2

2



2

2








And these columns will be merged into one and count distinct will be applied.

From: Bogala, Chandra Reddy [mailto:Chandra.Bogala@gs.com]
Sent: Thursday, January 09, 2014 7:17 AM
To: 'user@hive.apache.org'
Subject: RE: merge columns and count no of records

Or Is it good idea to get data into shell variable/file and doing processing. Or using a pig
script to do?

hive -e 'select distinct(columnA), distinct(columnB)  from blah' | sed 's/[\t]/,/g' >/tmp/test


Thanks,
Chandra


From: Bogala, Chandra Reddy [Tech]
Sent: Wednesday, January 08, 2014 5:49 PM
To: 'user@hive.apache.org'
Subject: merge columns and count no of records

Hi,
My requirement is to merge ( not concat )  two columns and count number of distinct records.
I can use self-join on column A and column B and can count number of records.
But  looks not optimal way of doing. Is there any better way to do.

Ex: Original table
Column A

COLUMN B

1

2

2

3

5

6

4

7

1

2

4

2


Logic something like this: Count(Distinct(Merge (distinct(A),distinct(B))))
Query OUTPUT should be :7
Values {1,2,3,4,5,6,7}

Thanks,
Chandra




************************************************************************************
This footnote confirms that this email message has been scanned by
PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses.
************************************************************************************
=

Mime
View raw message