hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Dvorzhak (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15124) Slow FileSystem.Statistics counters implementation
Date Thu, 21 Dec 2017 06:13:01 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299613#comment-16299613

Igor Dvorzhak commented on HADOOP-15124:

Thank you for feedback.

I would like to migrate FileSystem.Statistics to new StorageStatistics backend.
I will make myself familiar with StorageStatistics code and will see from where better to

Meanwhile, I have reverted changes to public interface in my PR, it uses both ThreadLocal
and LongAdder now.

After this, there no improvement to statistics writes performance, even small penalty, but
it should be negligible, because ThreadLocal.get much more expensive than LongAdder.add. Still
this change allows to get rid of all complicated and synchronized logic for statistics read,
which allows to decrease Wall time of Statistics code from 6.49% to 1.06% in 1TB TeraGen job
(CPU time increased to 29.4% though, but total runtime still decreased from 66 to 62 minutes).

I think that it could have sense to submit this PR before migration to StorageStatistics,
because it could be patched to 3.0 and 3.1 branches and provides some performance benefits.

Additionally, I'm thinking that while per-thread statistics is useful it not used in regular
prod-system job runs (I assume it more valuable for performance tuning and bottlenecks debugging),
that's why we can improve statistics writes performance by introducing property that allows
to disable per-thread statistics. This will allow to achieve performance characteristics of
my initial PR, while preserving all the functionality and backward compatibility. What do
you think?

Another idea, is to extend Thread class (HadoopThread?) and have Statistics field in it instead
of using ThreadLocal - this will allow to achieve much faster per-thread statistics writes
without need to disable them with property, but it could be more involving change that harder
to maintain.

Also, Netty has implemented FastThreadLocal and FastThreadLocalThred classes ( https://netty.io/4.1/api/io/netty/util/concurrent/FastThreadLocal.html
) to address issue of slow ThreadLocal access which we can consider too, but I like an idea
of dedicated Statistics field in extended Thread class more, because it will have better performance
than even FastThreadLocal implementation.

> Slow FileSystem.Statistics counters implementation
> --------------------------------------------------
>                 Key: HADOOP-15124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15124
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: common
>    Affects Versions: 2.9.0, 2.8.3, 2.7.5, 3.0.0
>            Reporter: Igor Dvorzhak
>            Assignee: Igor Dvorzhak
>              Labels: common, filesystem, statistics
> While profiling 1TB TeraGen job on Hadoop 2.8.2 cluster (Google Dataproc, 2 workers,
GCS connector) I saw that FileSystem.Statistics code paths Wall time is 5.58% and CPU time
is 26.5% of total execution time.
> After switching FileSystem.Statistics implementation to LongAdder, consumed Wall time
decreased to 0.006% and CPU time to 0.104% of total execution time.
> Total job runtime decreased from 66 mins to 61 mins.
> These results are not conclusive, because I didn't benchmark multiple times to average
results, but regardless of performance gains switching to LongAdder simplifies code and reduces
its complexity.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message