hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (HIVE-21912) Implement BlacklistingLlapMetricsListener
Date Thu, 04 Jul 2019 07:30:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-21912?focusedWorklogId=272028&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-272028
]

ASF GitHub Bot logged work on HIVE-21912:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 04/Jul/19 07:29
            Start Date: 04/Jul/19 07:29
    Worklog Time Spent: 10m 
      Work Description: pvary commented on pull request #698: HIVE-21912: Implement DisablingDaemonStatisticsHandler
URL: https://github.com/apache/hive/pull/698#discussion_r300262840
 
 

 ##########
 File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
 ##########
 @@ -4358,6 +4358,40 @@ private static void populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
       "The listener which is called when new Llap Daemon statistics is received on AM side.\n"
+
       "The listener should implement the " +
       "org.apache.hadoop.hive.llap.tezplugins.metrics.LlapMetricsListener interface."),
+    LLAP_TASK_SCHEDULER_BLACKLISTING_METRICS_LISTENER_MIN_SERVED_TASKS(
+      "hive.llap.task.scheduler.blacklisting.metrics.listener.min.served.tasks", 2000,
+      "If the number of tasks served by a node is below this number then we will ignore the
node\n" +
+      "when calculating the status of the cluster.\n" +
+      "Only used if hive.llap.task.scheduler.am.collect.daemon.metrics.listener is set to\n"
+
+      "org.apache.hadoop.hive.llap.tezplugins.metrics.BlacklistingLlapMetricsListener"),
+    LLAP_TASK_SCHEDULER_BLACKLISTING_METRICS_LISTENER_MIN_CHANGE_DELAY(
+      "hive.llap.task.scheduler.blacklisting.metrics.listener.min.change.delay", "300s",
+      new TimeValidator(TimeUnit.SECONDS),
+      "The minimum time which should elapse between blacklisting nodes, in seconds.\n" +
+      "Only used if hive.llap.task.scheduler.am.collect.daemon.metrics.listener is set to\n"
+
+      "org.apache.hadoop.hive.llap.tezplugins.metrics.BlacklistingLlapMetricsListener"),
+    LLAP_TASK_SCHEDULER_BLACKLISTING_METRICS_LISTENER_TIME_THRESHOLD(
+      "hive.llap.task.scheduler.blacklisting.metrics.listener.time.threshold", 1.5f,
+      "If the average response time of this node divided by the average response time of
all the other nodes\n" +
+      "is greater than this threshold and the other conditions are satisfied too,\n" +
+      "then this node should be blacklisted.\n" +
+      "Only used if hive.llap.task.scheduler.am.collect.daemon.metrics.listener is set to\n"
+
+      "org.apache.hadoop.hive.llap.tezplugins.metrics.BlacklistingLlapMetricsListener"),
+    LLAP_TASK_SCHEDULER_BLACKLISTING_METRICS_LISTENER_EMPTY_EXECUTORS(
+      "hive.llap.task.scheduler.blacklisting.metrics.listener.empty.executors.threshold",
2.0f,
+      "If a node is slow (hive.llap.task.scheduler.blacklisting.metrics.listener.time.threshold)\n"
+
+      "and the other nodes have\n" +
+      "hive.llap.task.scheduler.blacklisting.metrics.listener.empty.executors.threshold times
more free executors\n" +
+      "than the configured executors of the slow node and the other conditions are satisfied
too,\n" +
+      "then the node should be blacklisted\n" +
+      "Only used if hive.llap.task.scheduler.am.collect.daemon.metrics.listener is set to\n"
+
+      "org.apache.hadoop.hive.llap.tezplugins.metrics.BlacklistingLlapMetricsListener"),
+    LLAP_TASK_SCHEDULER_BLACKLISTING_METRICS_LISTENER_MAX_LISTED_NODES(
+      "hive.llap.task.scheduler.blacklisting.metrics.listener.max.listed.nodes", 1,
+      "The maximum number of blacklisted nodes. If there are at least this number of blacklisted
nodes\n" +
 
 Review comment:
   I literally spent half an hour to try to come up with good description of the configuration
parameters. I would appreciate any help for clarification - I feel I lacking on this field
so I would be happy to copy-paste anyone's description :)
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 272028)
    Time Spent: 3h  (was: 2h 50m)

> Implement BlacklistingLlapMetricsListener
> -----------------------------------------
>
>                 Key: HIVE-21912
>                 URL: https://issues.apache.org/jira/browse/HIVE-21912
>             Project: Hive
>          Issue Type: Sub-task
>          Components: llap, Tez
>            Reporter: Peter Vary
>            Assignee: Peter Vary
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-21912.patch, HIVE-21912.wip-2.patch, HIVE-21912.wip.patch
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> We should implement a DaemonStatisticsHandler which:
>  * If a node average response time is bigger than 150% (configurable) of the other nodes
>  * If the other nodes has enough empty executors to handle the requests
> Then disables the limping node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message