hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Grover <mgro...@oanda.com>
Subject Re: how to avoid scan the same table multi times?
Date Thu, 12 Jan 2012 16:33:02 GMT
Hi Xiaobin,
There is functionality like (from: https://cwiki.apache.org/Hive/languagemanual-dml.html):
Hive extension (multiple inserts):
FROM from_statement
INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
[INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...

This will do one pass through your table's partitions but group and filter them according
to different criteria. As you become more proficient with Hive, you could consider creating
an index on some columns or bucketing on those columns (for optimization in joins, for example)
if you end up filtering/grouping/joining on them over and over again.

Good luck!

Mark Grover, Business Intelligence Analyst
OANDA Corporation 

www: oanda.com www: fxtrade.com 
e: mgrover@oanda.com 

"Best Trading Platform" - World Finance's Forex Awards 2009. 
"The One to Watch" - Treasury Today's Adam Smith Awards 2009. 


----- Original Message -----
From: "Martin Kuhn" <martin.kuhn@affinitas.de>
To: user@hive.apache.org
Sent: Thursday, January 12, 2012 9:20:00 AM
Subject: Re: how to avoid scan the same table multi times?

Hi there,

> Select count(*), count(distinct u), type from t group by type where plat=1 and dt=”2012-1-12-02”
> Select count(*), count(distinct u), type from t where (type =2 or type =6) and dt=”2012-1-12-02”
group by type;

> Is there a better way to do these queries?

You could try something like this:

SELECT
    type
  , count(*)
  , count(DISTINCT u)
  , count(CASE WHEN plat=1 THEN u ELSE NULL)
  , count(DISTINCT CASE WHEN plat=1 THEN u ELSE NULL)
  , count(CASE WHEN (type=2 OR type=6) THEN u ELSE NULL)
  , count(DISTINCT CASE WHEN (type=2 OR type=6) THEN u ELSE NULL)
FROM
    t
WHERE
    dt in ("2012-1-12-02", "2012-1-12-03")
GROUP BY
    type
ORDER BY
    type
;

Good luck :)
Martin Kuhn


P.S.  You'ge got a strange date format there. For sorting purposes it would be more appropriate
to use something like "2012-01-12-02".


Mime
View raw message