hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guy Doulberg <>
Subject RE: Distinct in hive
Date Wed, 26 Jan 2011 15:02:26 GMT

That was it

From: Namit Jain []
Sent: Tuesday, January 25, 2011 7:04 PM
Subject: Re: Distinct in hive

Is there skew in data ?
You may want to set the parameter: hive.groupby.skewindata: to true.


From: Guy Doulberg <<>>
Reply-To: <<>>
Date: Tue, 25 Jan 2011 08:25:36 -0800
To: "<>" <<>>
Subject: Distinct in hive

We made a query in hive, that calculates the number of distinct values in a  group by.
On small portion of data it worked well, however when we ran the query over large portion
of data, we failed because OutOfMemory in some of the reducers.

We wonder how is the distinct operator works in HIVE? Does it use some sort ofdata structure
that its size is proportional to the number of distinct values?

Many thanks

View raw message