hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Tatarinov <i...@decide.com>
Subject Re: Does the "GROUP BY" need to be strict in Hive?
Date Wed, 19 Oct 2011 20:44:22 GMT
Yes, MySQL is not SQL-standard compliant here but I really like this
feature. A lot of times I know that a group-by column is a key for some
other columns I am selecting. It seems pointless to add all those other
columns to the group-by list.

Other times, I use this 'hole' in MySQL to pick a (random) value for a
non-group-by column. In fact, MySQL returns the first value in each
grouping, so
SELECT pid, seller, price
FROM (
  SELECT pid, seller, price
  FROM offers
  ORDER by pid, price) t
GROUP BY pid;

returns the lowest seller/price for each product id. (This is not a
documented feature though but it seems to work. It's safer to do the same
with variables or a (slow) join.)

igor
decide.com

On Wed, Oct 19, 2011 at 1:06 PM, Mark Kerzner <mark.kerzner@shmsoft.com>wrote:

> Hi,
>
> I think I've isolated my Hive GROUP BY problem to this question,
>
> In Hive, the GROUP BY needs to be strict. I mean
>
> hive> select property_id["property_id"], log_timestamp from trans group by
> property_id["property_id"];
> FAILED: Error in semantic analysis: Line 1:35 Expression not in GROUP BY
> key log_timestamp
>
> and that error message means it.
>
> However, a similar quiery in MySQL
>
> SELECT ext_booking_id, product_id FROM offer_sets group by ext_booking_id;
>
> works just fine.
>
> HIve seems to be on the right about this, and MySQL should not allow this.
>
> What do you all think?
>
> Thank you,
> Mark
>

Mime
View raw message