hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Krogen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14084) Need for more stats in DFSClient
Date Fri, 01 Feb 2019 22:21:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758718#comment-16758718
] 

Erik Krogen commented on HDFS-14084:
------------------------------------

Hey folks, I was testing out this patch and noticed an issue that I consider pretty serious.
When this is used in a standard DFS client, the {{DefaultMetricsSystem}} singleton has never
been initialized, so there is no proper prefix to use for configurations, and numerous bits
of testing code is triggered. For example:
{code:title=MetricsSystemImpl#register()}
    final String finalName = // be friendly to non-metrics tests
        DefaultMetricsSystem.sourceName(name2, !monitoring);
{code}
With your patch, this is triggered when {{monitoring}} is false, which is really only intended
for testing AFAICT. This is the first instance I'm aware of that is leveraging metrics2 for
client-side metrics. I think it means that we need to add a {{DefaultMetricsSystem.init("client")}}
in the instantiation of {{Client}}. It will need a corresponding {{shutdown()}}, probably
on {{Client#close()}}. Unfortunately both of these methods are expected to be only called
once, so we probably need to add some new mechanisms for a "conditional initialization" that
only initializes the system if this is the first call.

> Need for more stats in DFSClient
> --------------------------------
>
>                 Key: HDFS-14084
>                 URL: https://issues.apache.org/jira/browse/HDFS-14084
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.0.0
>            Reporter: Pranay Singh
>            Assignee: Pranay Singh
>            Priority: Minor
>         Attachments: HDFS-14084.001.patch, HDFS-14084.002.patch, HDFS-14084.003.patch,
HDFS-14084.004.patch, HDFS-14084.005.patch, HDFS-14084.006.patch, HDFS-14084.007.patch, HDFS-14084.008.patch,
HDFS-14084.009.patch, HDFS-14084.010.patch, HDFS-14084.011.patch, HDFS-14084.012.patch, HDFS-14084.013.patch,
HDFS-14084.014.patch, HDFS-14084.015.patch, HDFS-14084.016.patch, HDFS-14084.017.patch, HDFS-14084.018.patch
>
>
> The usage of HDFS has changed from being used as a map-reduce filesystem, now it's becoming
more of like a general purpose filesystem. In most of the cases there are issues with the
Namenode so we have metrics to know the workload or stress on Namenode.
> However, there is a need to have more statistics collected for different operations/RPCs
in DFSClient to know which RPC operations are taking longer time or to know what is the frequency
of the operation.These statistics can be exposed to the users of DFS Client and they can periodically
log or do some sort of flow control if the response is slow. This will also help to isolate
HDFS issue in a mixed environment where on a node say we have Spark, HBase and Impala running
together. We can check the throughput of different operation across client and isolate the
problem caused because of noisy neighbor or network congestion or shared JVM.
> We have dealt with several problems from the field for which there is no conclusive evidence
as to what caused the problem. If we had metrics or stats in DFSClient we would be better
equipped to solve such complex problems.
> List of jiras for reference:
> -------------------------
>  HADOOP-15538 HADOOP-15530 ( client side deadlock)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message