hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chengxiang Li (JIRA)" <>
Subject [jira] [Commented] (HIVE-8456) Support Hive Counter to collect spark job metric[Spark Branch]
Date Thu, 16 Oct 2014 05:29:34 GMT


Chengxiang Li commented on HIVE-8456:

1. Shall we think of better names for the new classes? Because the naming (e.g. SparkCounterGroup
and SparkCounters) seems a little bit confusing to me.
The classes names are inherit from MR/Tez counterpart to keep it consistent. If the class
names are confusing, we may open a ticket to modify all MR/Tez/Spark counters together in
the future.
2. Have we defined all the counters in SparkCounters.initializeSparkCounters? For example,
it seems Operator.HIVECOUNTERFATAL isn't added there.
Not yet, it's not easy to find all necessary counters and register them in this time, I plan
to register specified counters while enable features which depends on spark counter.
3. The Counter enum in operators doesn't seem to be used as "Counter" in hive. Rather, it's
just kept in statsMap : HashMap<Enum<?>, LongWritable>. Maybe we shouldn't add
them as SparkCounter? If we do want to wrap them as SparkCounter, there're other operators
to handle other than MapOperator, e.g. FilterOperator and JoinOperator also have such an enum.
statsMap is used to gather table statistic information here i suppose, as Hive use Counter
as an option to store table statistic information. Mainly Hive could register SparkCounter
with Enum class name as group name and Enum name as counter name, and this's why SparkCounters
API support create/get/increment counters with Enum parameter.
4. Maybe we should always use HiveConf.ConfVars.HIVECOUNTERGROUP as the group name, rather
than the enum class name (key.getDeclaringClass().getName())?
The group/counter name are all inherit from MR/Tez counterpart, counters are folded into different
group, i think we should consist the fold if it make sense.

> Support Hive Counter to collect spark job metric[Spark Branch]
> --------------------------------------------------------------
>                 Key: HIVE-8456
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Chengxiang Li
>            Assignee: Chengxiang Li
>              Labels: Spark-M3
>         Attachments: HIVE-8456.1-spark.patch, HIVE-8456.2-spark.patch
> Several Hive query metric in Hive operators is collected by Hive Counter, such as CREATEDFILES
and DESERIALIZE_ERRORS, Besides, Hive use Counter as an option to collect table stats info.
 Spark support Accumulator which is pretty similiar with Hive Counter, we could try to enable
Hive Counter based on it.

This message was sent by Atlassian JIRA

View raw message