hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14475) Metrics of S3A don't print out when enable it in Hadoop metrics property file
Date Wed, 21 Jun 2017 09:58:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057270#comment-16057270

Steve Loughran commented on HADOOP-14475:

bq. the name change of context just for distinguish with other attributes, such as MetricsRegistry
and Metrics name. From the following log, it shows using different names is better than ones
with the same name:
17/06/05 20:32:54 DEBUG impl.MetricsSinkAdapter: Pushing record S3AFileSystemMetrics.s3a.s3afilesystem
to file

we should be ok with staying with "S3AFileSystemMetrics" for now

bq. 2.after i make a collection the relationship of those classes, i also think the functions
of class S3AFileSystemMetricsSystem can be merge into some existed class, maybe S3AFileSystem.

{{S3AFileSystem}} is *way to big* right now; we've been pulling everything out into its own
isolated classes wherever possible. It's a losing battle (look at the HADOOP-13345) branch,
but we try. Generally we're doing this with package-private classes which take {{S3AFileSystem
owner}} as a constructor. 

Regarding instances

* Calls to {{FileSystem.get(URI, conf)}} or {{Path.getFilesystem(conf)}} will return the shared
FS for that user.
* Unless the relevant system property to create unique instances for every call has been set.
* We like to share FS instances to allow for sharing of thread pools (s3, azure) and IPC channels
(HDFS), so the unique stuff is generally left for whan you are changing the Configuration
settings and really want new instances.
* Ideally an MR/Hive/spark job should have one instance per user per JVM
* And the MR job can call FileSystem.getStatistics() on the FS after the run to get the statistics
for every FS in the JVM, to get statistics we can then aggregate across the entire job.

What this means is that MR jobs *should* have one S3AFS instance per VM (single User app and
all), but services such as Hive LLAP will have many instances, created when queries come in,
released afterwards.

> Metrics of S3A don't print out  when enable it in Hadoop metrics property file
> ------------------------------------------------------------------------------
>                 Key: HADOOP-14475
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14475
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.8.0
>         Environment: uname -a
> Linux client01 4.4.0-74-generic #95-Ubuntu SMP Wed Apr 12 09:50:34 UTC 2017 x86_64 x86_64
x86_64 GNU/Linux
>  cat /etc/issue
> Ubuntu 16.04.2 LTS \n \l
>            Reporter: Yonger
>            Assignee: Yonger
>         Attachments: s3a-metrics.patch1, stdout.zip
> *.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink
> #*.sink.file.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #*.sink.influxdb.url=http:/xxxxxxxxxx
> #*.sink.influxdb.influxdb_port=8086
> #*.sink.influxdb.database=hadoop
> #*.sink.influxdb.influxdb_username=hadoop
> #*.sink.influxdb.influxdb_password=hadoop
> #*.sink.ingluxdb.cluster=c1
> *.period=10
> #namenode.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> #S3AFileSystem.sink.influxdb.class=org.apache.hadoop.metrics2.sink.influxdb.InfluxdbSink
> S3AFileSystem.sink.file.filename=s3afilesystem-metrics.out
> I can't find the out put file even i run a MR job which should be used s3.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message