hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Verlangen <ro...@us2.nl>
Subject Re: collecting lists in non-natural order / SORT BY on columns which are not part of the final result set
Date Sat, 13 Jun 2015 14:52:26 GMT
Hi Michael,

You can try using a with statement, pseudo:

WITH input AS (SELECT colA, colB FROM table ORDER BY colA ASC)
SELECT colB FROM input

Best regards,

Robin Verlangen
*Chief Data Architect*

W http://www.robinverlangen.nl
E robin@us2.nl

<http://goo.gl/Lt7BC>
*What is CloudPelican? <http://goo.gl/HkB3D>*

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

On Sat, Jun 13, 2015 at 3:50 PM, Michael Häusler <michael@akatose.de> wrote:

> Hi there,
>
> imagine you have a table of time-series transactions, e.g.:
> CREATE TABLE
>     foobar (actor BIGINT, ts BIGINT, action STRING);
>
> containing the rows:
> 1       2000    bar
> 1       1000    foo
> 2       1500    foo
>
> An interesting query would be to get a denormalized view on all actions of
> an actor sorted by timestamp:
> 1       [foo, bar]
> 2       [foo]
>
> This would require collecting the actions in a list in non-natural order.
> A naive query would look like this:
> SET hive.map.aggr = false;
> SELECT
>     actor,
>     COLLECT_LIST(action) AS actions
> FROM
>     foobar
> GROUP BY
>     actor
> SORT BY
>     actor, ts;
>
> Unfortunately, as the column "ts" is not in the final result set, this
> produces a SemanticException [Error 10004]: Line 9:11 Invalid table alias
> or column reference 'ts'.
>
>
> I do understand that it is difficult to allow a global ORDER BY on column
> that is not in the result set.
> But the SORT BY only needs to ensure order within the reduce group. Is
> there a way to get this behaviour in Hive?
>
> Best regards
> Michael
>

Mime
View raw message