hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Work logged] (HDDS-1935) Improve the visibility with Ozone Insight tool
Date Thu, 15 Aug 2019 20:56:01 GMT

     [ https://issues.apache.org/jira/browse/HDDS-1935?focusedWorklogId=295749&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-295749

ASF GitHub Bot logged work on HDDS-1935:

                Author: ASF GitHub Bot
            Created on: 15/Aug/19 20:55
            Start Date: 15/Aug/19 20:55
    Worklog Time Spent: 10m 
      Work Description: adoroszlai commented on pull request #1255: HDDS-1935. Improve the
visibility with Ozone Insight tool
URL: https://github.com/apache/hadoop/pull/1255#discussion_r314484032

 File path: hadoop-ozone/insight/src/main/java/org/apache/hadoop/ozone/insight/List.java
 @@ -0,0 +1,38 @@
+package org.apache.hadoop.ozone.insight;
+import org.apache.hadoop.hdds.cli.HddsVersionProvider;
+import org.apache.hadoop.hdds.conf.OzoneConfiguration;
+import picocli.CommandLine;
+import java.util.Map;
+import java.util.concurrent.Callable;
+ * Subcommand to list of the available insight points.
+ */
+    name = "list",
+    description = "Show available insight points.",
+    mixinStandardHelpOptions = true,
+    versionProvider = HddsVersionProvider.class)
+public class List extends BaseInsightSubcommand implements Callable<Void> {
+  @CommandLine.Parameters(defaultValue = "")
+  private String selection;
 Review comment:
   Do you plan to use this parameter, eg. to filter available insight point list?
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

Issue Time Tracking

    Worklog Id:     (was: 295749)
    Time Spent: 1h 10m  (was: 1h)

> Improve the visibility with Ozone Insight tool
> ----------------------------------------------
>                 Key: HDDS-1935
>                 URL: https://issues.apache.org/jira/browse/HDDS-1935
>             Project: Hadoop Distributed Data Store
>          Issue Type: New Feature
>            Reporter: Elek, Marton
>            Assignee: Elek, Marton
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
> Visibility is a key aspect for the operation of any Ozone cluster. We need better visibility
to improve correctnes and performance. While the distributed tracing is a good tool for improving
the visibility of performance we have no powerful tool which can be used to check the internal
state of the Ozone cluster and debug certain correctness issues.
> To improve the visibility of the internal components I propose to introduce a new command
line application `ozone insight`.
> The new tool will show the selected metrics / logs / configuration for any of the internal
components (like replication-manager, pipeline, etc.).
> For each insight points we can define the required logs and log levels, metrics and configuration
and the tool can display only the component specific information during the debug.
> h2. Usage
> First we can check the available insight point:
> {code}
> bash-4.2$ ozone insight list
> Available insight points:
>   scm.node-manager                     SCM Datanode management related information.
>   scm.replica-manager                  SCM closed container replication manager
>   scm.event-queue                      Information about the internal async event delivery
>   scm.protocol.block-location          SCM Block location protocol endpoint
>   scm.protocol.container-location      Planned insight point which is not yet implemented.
>   scm.protocol.datanode                Planned insight point which is not yet implemented.
>   scm.protocol.security                Planned insight point which is not yet implemented.
>   scm.http                             Planned insight point which is not yet implemented.
>   om.key-manager                       OM Key Manager
>   om.protocol.client                   Ozone Manager RPC endpoint
>   om.http                              Planned insight point which is not yet implemented.
>   datanode.pipeline[id]                More information about one ratis datanode ring.
>   datanode.rocksdb                     More information about one ratis datanode ring.
>   s3g.http                             Planned insight point which is not yet implemented.
> {code}
> Insight points can define configuration, metrics and/or logs. Configuration can be displayed
based on the configuration objects:
> {code}
> ozone insight config scm.protocol.block-location
> Configuration for `scm.protocol.block-location` (SCM Block location protocol endpoint)
> >>> ozone.scm.block.client.bind.host
>        default:
>        current:
> The hostname or IP address used by the SCM block client  endpoint to bind
> >>> ozone.scm.block.client.port
>        default: 9863
>        current: 9863
> The port number of the Ozone SCM block client service.
> >>> ozone.scm.block.client.address
>        default: ${ozone.scm.client.address}
>        current: scm
> The address of the Ozone SCM block client service. If not defined value of ozone.scm.client.address
is used
> {code}
> Metrics can be retrieved from the prometheus entrypoint:
> {code}
> ozone insight metrics scm.protocol.block-location
> Metrics for `scm.protocol.block-location` (SCM Block location protocol endpoint)
> RPC connections
>   Open connections: 0
>   Dropped connections: 0
>   Received bytes: 0
>   Sent bytes: 0
> RPC queue
>   RPC average queue time: 0.0
>   RPC call queue length: 0
> RPC performance
>   RPC processing time average: 0.0
>   Number of slow calls: 0
> Message type counters
>   Number of AllocateScmBlock: 0
>   Number of DeleteScmKeyBlocks: 0
>   Number of GetScmInfo: 2
>   Number of SortDatanodes: 0
> {code}
> Log levels can be adjusted with the existing logLevel servlet and can be collected /
streamd via a simple logstream servlet:
> {code}
> ozone insight log scm.node-manager
> [SCM] 2019-08-08 12:42:37,392 [DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager]
Processing node report from [datanode=ozone_datanode_1.ozone_default]
> [SCM] 2019-08-08 12:43:37,392 [DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager]
Processing node report from [datanode=ozone_datanode_1.ozone_default]
> [SCM] 2019-08-08 12:44:37,392 [DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager]
Processing node report from [datanode=ozone_datanode_1.ozone_default]
> [SCM] 2019-08-08 12:45:37,393 [DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager]
Processing node report from [datanode=ozone_datanode_1.ozone_default]
> [SCM] 2019-08-08 12:46:37,392 [DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager]
Processing node report from [datanode=ozone_datanode_1.ozone_default]
> {code}
> The verbose mode can display the raw messages as well:
> {code}
> [SCM] 2019-08-08 13:16:37,398 [DEBUG|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager]
Processing node report from [datanode=ozone_datanode_1.ozone_default]
> [SCM] 2019-08-08 13:16:37,400 [TRACE|org.apache.hadoop.hdds.scm.node.SCMNodeManager|SCMNodeManager]
HB is received from [datanode=ozone_datanode_1.ozone_default]: 
> storageReport {
>   storageUuid: "DS-bffe6bee-1166-4502-acf5-57fc16c5aa98"
>   storageLocation: "/data/hdds"
>   capacity: 470282264576
>   scmUsed: 16384
>   remaining: 205695963136
>   storageType: DISK
>   failed: false
> }
> {code}
> h2. Use cases
> Ozone insight can be used for any kind of debuging. Some problem examples from my yesterday
>  1. Due to a cache problem the volumes were created twice without any error at the second
time. With this tool I can check the state of the internal cache, or check if the volume is
added to the rocksdb itself.
>  2. After fixing this problem we found an DNS caching issue. The OM responded with an
error but it was not clear where the error was propagated from (it was created in OzoneManagerProtocolClientSideTranslatorPB.handleError).
With checking the traffic between SCM and OM it can be easy to track the origin of a specific
>  4. After fixing this problem we found some pipline problem (reported later at HDDS-1933).
With this tool I could check the content of the reports and messages to the pipeline manager.
> h2. Implementation
> We can implement the tool without any significant code change as it uses existing features:
>  * Metrics can be downloaded from the `/prom` endpoint
>  * Log Level can be set with the existing `/logLevel` servlet endpoint (from hadoop-common)
>  * Log lines can be streamed with a very simple new servlet
>  * Configuration can be displayed based on configuration points
> A new interface can be introduced for `InsightPoint`s where all the affected logs/levels,
metrics and config classes can be defined for each components.
> Prometheus servlet endpoint can be changed to be turned on by default.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message