hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: how to avoid scan the same table multi times?
Date Thu, 12 Jan 2012 21:00:37 GMT
From:
https://cwiki.apache.org/Hive/languagemanual-dml.html#LanguageManualDML-InsertingdataintoHiveTablesfromqueries

Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1,
partcol2=val2 ...)] select_statement1
[INSERT OVERWRITE TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;
FROM from_statement
INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2
...)] select_statement1
[INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]
[INSERT OVERWRITE TABLE tablename2 [PARTITION ...] select_statement2] ...;



2012/1/12 Xiaobin She <xiaobinshe@gmail.com>

> Hello, everyone,
>
> I'm new to hive, and I got some questions.
>
> I have a table like this:
>
> create table t(id int, time string, ip string, u bigint, ret int, plat
> int, type int, u2 bigint, ver int)  PARTITIONED BY(dt STRING)   ROW FORMAT
> DELIMITED FIELDS TERMINATED BY ','  lines TERMINATED BY '\n' ;
>
> and I will do lots of query on this table base on different value of the
> column, like:
>
>
> Select count(*), count(distinct u), type from t group by type where plat=1
> and dt=”2012-1-12-02”
>
> Select count(*), count(distinct u), type from t group by type where plat=2
> and dt=”2012-1-12-02”
>
> Select count(*), count(distinct u), type from t where (type =2 or type =6)
> and dt=”2012-1-12-02” group by type;
>
> Select count(*), count(distinct u), type from t where (type =1 or type =5)
> and dt=”2012-1-12-02” group by type;
>
> Select count(*), count(distinct u), type from t where (type =1 or type =5)
> and (dt=”2012-1-12-02” and dt=”2012-1-12-03”) group by type;
>
> but these queries seems not so effective, because they query on the same
> table for multiple times, and that meas it will scan the same files for
> many times.
>
> And my question is , how can I avoid this?
> Is there a better way to do these queries?
>
> Thank you very much for your help!
>
>

Mime
View raw message