hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navdeep Agrawal <Navdeep_Agra...@symantec.com>
Subject RE: [HELP]Hive Statistics
Date Thu, 24 Jul 2014 15:09:45 GMT
Well the problem exactly didn’t get solved but I observed this kind of behavior is persistent
when I partition my table by date type otherwise its working . may its worth a issue .

Thank you

From: Navdeep Agrawal [mailto:Navdeep_Agrawal@symantec.com]
Sent: Thursday, July 24, 2014 1:22 PM
To: user@hive.apache.org
Subject: [HELP]Hive Statistics


Stuck .need help
I created a small table with multiple partition desc (id int ,term int) partitioned by id
,whenever I run analyze on any id I am getting perfectly good answers . I am unable to figure
out the difference each file is making .

New table
Table Parameters:
        transient_lastDdlTime   1406016417

# Storage Information
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        serialization.format    1
inserting
insert into table statdevp partition(id) select id,term from statdev where id is not null
and term is not null
analyze
analyze table statdevp partition(id=11) compute statistics for columns id;

I am able to see all values in part_col_stat for the partitions I am running analyze

and the orginal table :
desc

Table Parameters:
        last_modified_by        XXXXXXXXX
        last_modified_time      1406047797
        transient_lastDdlTime   1406047797

# Storage Information
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        serialization.format    1

load_inst_id            int                                                              src_filename
           string                                                           server_date  
          date                                                                           
                                                                          # Partition Information
                                                                 # col_name              data_type
              comment                                                                    
                                                      server_date             date

insertion
insert into devStat partition(server_date) select load_inst_id,src_filename,server_ts,server_date
from rrslog_ext where server_date='2013-11-01' and load_inst_id is not null  limit 100
analyze
analyze table devstat partition(server_date='2013-11-30') compute statistics for columns load_inst_id;

any help will be highly appreciated .stuck here long time …
thanks in advance .

From: Navdeep Agrawal [mailto:Navdeep_Agrawal@symantec.com]
Sent: Wednesday, July 23, 2014 3:17 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: RE: Hive Statistics

I think that is already configured since I am able to get statistics in other tables .
Because if I have not set these parameters why every time a new row is getting created  in
mysql db in part_col_stats table .

From: Andre Araujo [mailto:araujo@pythian.com]
Sent: Wednesday, July 23, 2014 1:22 PM
To: user
Subject: Re: Hive Statistics

Hi, Navdeep,

Please note that the configuration for the stats database is separate from the configuration
for the metastore db.
Can you confirm you have both to use a mysql db?

The properties for the stats db are:
hive.stats.dbclass=
hive.stats.dbconnectionstring=


On 23 July 2014 16:07, Navdeep Agrawal <Navdeep_Agrawal@symantec.com<mailto:Navdeep_Agrawal@symantec.com>>
wrote:
Thank you Nitin for reply. I am using mysql database ,and also I can see new row created for
the partition ,but all values are zero . I think explicitly giving mysql data base wont make
a difference .

From: Nitin Pawar [mailto:nitinpawar432@gmail.com<mailto:nitinpawar432@gmail.com>]
Sent: Tuesday, July 22, 2014 11:05 PM
To: user@hive.apache.org<mailto:user@hive.apache.org>
Subject: Re: Hive Statistics

by default hive stores the statistics in derby database.

If you want a persistent look at column statistics, you may want to create mysql based database
for column statistics.

Your queries look fine

On Tue, Jul 22, 2014 at 10:50 PM, Navdeep Agrawal <Navdeep_Agrawal@symantec.com<mailto:Navdeep_Agrawal@symantec.com>>
wrote:

Hi ,



i am trying to compute statistics on ORC File but i am unable see any changes in PART_COL_STATS
as well on using

set hive.compute.query.using.stats=true;
set hive.stats.reliable=true;
set hive.stats.fetch.column.stats=true;
set hive.stats.fetch.partition.stats=true;
set hive.cbo.enable=true;

to get max value of a column it is running full Map reduce on column ..
what i want to use is max value stored in meta store ,but i am unable to catch these statistics
.

my table desc is
load_inst_id int
src_filename string
server_date date

my analyze query is
analyze table mytable partition(server_date=’2013-11-30′) compute statistics for columns
load_inst_id;

i am always getting 0 as loadinstant id ,i have to turn off my hive.compute.query.using.stats
to get correct result(through map reduce max(load_inst_id))




--
Nitin Pawar



--
André Araújo
Big Data Consultant/Solutions Architect
The Pythian Group - Australia - www.pythian.com<http://www.pythian.com>
Office (calls from within Australia): 1300 366 021 x1270
Office (international): +61 2 8016 7000  x270 OR +1 613 565 8696   x1270
Mobile: +61 410 323 559
Fax: +61 2 9805 0544
IM: pythianaraujo @ AIM/MSN/Y! or araujo@pythian.com<mailto:araujo@pythian.com> @ GTalk

“Success is not about standing at the top, it's the steps you leave behind.” — Iker
Pou (rock climber)


--


Mime
View raw message