hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Moore, Douglas" <Douglas.Mo...@thinkbiganalytics.com>
Subject Re: view over partitioned table
Date Mon, 16 Mar 2015 14:19:50 GMT
Mich,

What version of Hive are you running?
Have you seen this?
https://cwiki.apache.org/confluence/display/Hive/PartitionedViews

- Douglas
From: Mich Talebzadeh <mich@peridale.co.uk<mailto:mich@peridale.co.uk>>
Reply-To: <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Sun, 15 Mar 2015 19:01:57 +0000
To: <user@hive.apache.org<mailto:user@hive.apache.org>>, 'cobby cohen' <ququrico@yahoo.com<mailto:ququrico@yahoo.com>>
Subject: RE: view over partitioned table

OK,

This is the way I read it. Crearte table t1 .. partitioned by date will use horizontal partitioning
as per common with any RDBMS say Oracle.

The view I will create it as follows:

hive> create view v1 as select * from t1;
OK
Time taken: 0.073 seconds

hive> analyze table t1 partition (date) compute statistics;
Query ID = hduser_20150315184646_af3890b2-4079-4f81-b38b-da9af67b2751
Total jobs = 1
Launching Job 1 out of 1
-----
Stage-Stage-0: Map: 1   Cumulative CPU: 0.6 sec   HDFS Read: 299 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 600 msec
OK

hive> insert into table t1 partition (date)  values (1,'2015-04-04');
hive> insert into table t1 partition (date)  values (1,'2015-05-04');

Help to update stats

hive> analyze table t1 partition (date) compute statistics;

select from the view after exiting hive and back


hive> explain select id from v1 where date = '2015-04-04';
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: t1
          Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: id (type: string)
            outputColumnNames: _col0
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
            ListSink

Time taken: 0.627 seconds, Fetched: 17 row(s)


Now select the same from base table

hive> explain select id from t1 where date = '2015-04-04';
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        TableScan
          alias: t1
          Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: id (type: string)
            outputColumnNames: _col0
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
            ListSink

Time taken: 0.597 seconds, Fetched: 17 row(s)

The plan looks the same but it is slower on the view

HTH

Mich

NOTE: The information in this email is proprietary and confidential. This message is for the
designated recipient only, if you are not the intended recipient, you should destroy it immediately.
Any information in this message shall not be understood as given or endorsed by Peridale Ltd,
its subsidiaries or their employees, unless expressly so stated. It is the responsibility
of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd,
its subsidiaries nor their employees accept any responsibility.

From: cobby cohen [mailto:ququrico@yahoo.com]
Sent: 15 March 2015 17:52
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: view over partitioned table

in the following case:

create table T1(id string)
partitioned by (date date);

create view V1(id, date)
select id, date from T1

if i do the following
select id from V1 where date = '2015-04-04'
will the query optimize on T1's partitioning?

and what if V1 is a bit more complex, further depending on other views (all of which expose
the partitioned column as-is)?

thx!

Mime
View raw message