hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Morel" <>
Subject Re: Difference in number of row observstions from distinct and group by
Date Mon, 25 Nov 2013 13:46:18 GMT
On 25 Nov 2013, at 9:06, Mayank Bansal wrote:

> Hi,
> I was also thinking that this might be the case. For that reason I ran 
> this query
> Select * from (select col1,col2,col3,count(*)  as val from table_name
> group by col1,col2,col3)a where a.val>1 ;
> The output that I receive from this query is blank, then I ended up 
> doing count(*) and I got the same number of rows as originally in the 
> table. Please help me figure this out.

Instead of going circles, I would put the 2 result sets in 2 tables 
(with a concatenated PK made of your 3 columns, with some separator like 
'-'), and do a left outer join of table 1 on table 2.

you'd be able to identify quickly what went wrong. Sort the result so 
you get unlikely dupes, and all. Just trial and error until you nail it.


View raw message