hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5916) No need to aggregate statistics collected via counter mechanism
Date Mon, 02 Dec 2013 22:52:35 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837038#comment-13837038
] 

Ashutosh Chauhan commented on HIVE-5916:
----------------------------------------

Currently from TableScanOp we publish statistics to JobTracker with aggrKey as counter group
name, actual statistics type (numRows etc) as counter and value of statistics as counter value.
We can simply use statistics type both as counter group name as well as counter name and then
we need not to do invoke stats aggregation from hive client when query finishes. This has
following advantages:
* Client don't need to do any aggregation. After retrieving statistics from JobTracker (via
JobClient) it can directly add them to metastore.
* It lowers memory footprint on JobTracker, since instead of having counters per task per
partition, it will have counters per partition only.

> No need to aggregate statistics collected via counter mechanism 
> ----------------------------------------------------------------
>
>                 Key: HIVE-5916
>                 URL: https://issues.apache.org/jira/browse/HIVE-5916
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 0.13.0
>            Reporter: Ashutosh Chauhan
>
> This results in unnecessary computations and waste of cluster resources which is not
required since aggregation of counter is anyway done by JobTracker.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message