hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6500) Stats collection via filesystem
Date Tue, 25 Feb 2014 08:03:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13911365#comment-13911365
] 

Ashutosh Chauhan commented on HIVE-6500:
----------------------------------------

In FS based stats collection, idea is each task will write stats it has collected in a file
on FS, which than will be aggregated after job has finished.

> Stats collection via filesystem
> -------------------------------
>
>                 Key: HIVE-6500
>                 URL: https://issues.apache.org/jira/browse/HIVE-6500
>             Project: Hive
>          Issue Type: New Feature
>          Components: Statistics
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>         Attachments: HIVE-6500.patch
>
>
> Recently, support for stats gathering via counter was [added | https://issues.apache.org/jira/browse/HIVE-4632]
Although, its useful it has following issues:
> * [Length of counter group name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340]
> * [Length of counter name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337]
> * [Number of distinct counter groups are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343]
> * [Number of distinct counters are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334]
> Although, these limits are configurable, but setting them to higher value implies increased
memory load on AM and job history server.
> Now, whether these limits makes sense or not is [debatable | https://issues.apache.org/jira/browse/MAPREDUCE-5680]
it is desirable that Hive doesn't make use of counters features of framework so that it we
can evolve this feature without relying on support from framework. Filesystem based counter
collection is a step in that direction.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message