hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rex X <dnsr...@gmail.com>
Subject What's the advised way to do groupby 2 attributes from a table with 1000 columns?
Date Sun, 27 Mar 2016 22:12:09 GMT
Give a table with 1000 columns:
    col1, col2, ..., col1000

The source table is about 1PB.

I only need to query 3 columns,

select col1, col2, sum(col3) as col3
from myTable
group by
col1, col2


Will it be advised to do a subquery first, and then send it to the
aggregation of group by, so that we have smaller files sending to groupby?
Not sure it Hive automatically takes care of this.

select col1, col2, sum(col3) as col3
from
    (select col1, col2, col3
     from myTable
    ) a
group by
col1, col2

Mime
View raw message