hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13305) Define common statistics names across schemes
Date Wed, 22 Jun 2016 00:26:58 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mingliang Liu updated HADOOP-13305:
-----------------------------------
    Attachment: HADOOP-13305.000.patch

The v0 patch:
- Defines common file system operation related statistics in a interface
- Refers the common names in the {{DFSOpsCountStatistics}} and {{s3a/Statistic}} classes
- Makes {{StorageStatistics}} abstract class return its scheme, if it's scheme specific (mostly
it is, e.g. {{DFSOpsCountStatistics}}, {{s3a/Statistic}}, and {{FileSystemStorageStatistics}}).
Considering the common names are shared across different file system schemes, downstream applications
need this information for eaiser interpretation and categorization.
- Adds a simple unit test for unique OpType names

> Define common statistics names across schemes
> ---------------------------------------------
>
>                 Key: HADOOP-13305
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13305
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs
>    Affects Versions: 2.8.0
>            Reporter: Mingliang Liu
>            Assignee: Mingliang Liu
>             Fix For: 2.8.0
>
>         Attachments: HADOOP-13305.000.patch
>
>
> The {{StorageStatistics}} provides a pretty general interface, i.e. {{getLong(name)}}
and {{getLongStatistics()}}. There is no shared or standard names for the storage statistics
and thus the getLong(name) is up to the implementation of storage statistics. The problems:
> # For the common statistics, downstream applications expect the same statistics name
across different storage statistics and/or file system schemes. Chances are they have to use
{{DFSOpsCountStorageStatistics#getLong(“getStatus”)}} and {{S3A.Statistics#getLong(“get_status”)}}
for retrieving the getStatus operation stat.
> # Moreover, probing per-operation stats is hard if there is no standard/shared common
names.
> It makes a lot of sense for different schemes to issue the per-operation stats of the
same name. Meanwhile, every FS will have its own internal things to count, which can't be
centrally defined or managed. But there are some common which would be easier managed if they
all had the same name.
> Another motivation is that having a common set of names here will encourage uniform instrumentation
of all filesystems; it will also make it easier to analyze the output of runs, were the stats
to be published to a "performance log" similar to the audit log. See Steve's work for S3 
(e.g. [HADOOP-13171])
> This jira is track the effort of defining common StorageStatistics entry names. Thanks
to [~cmccabe], [~stevel@apache.org], [~hitesh] and [~jnp] for offline discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message