hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sangjin Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6735) Performance degradation caused by MAPREDUCE-5465 and HADOOP-12107
Date Tue, 19 Jul 2016 21:14:20 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384866#comment-15384866

Sangjin Lee commented on MAPREDUCE-6735:

It is certainly surprising that HADOOP-12107 is making a difference on terasort. What version
of hadoop are you using for your test? Java version? Is it repeatable (i.e. the gap shows
up consistently)?

FYI, the nature of HADOOP-12107 has to do with *when* to clean up a certain data ( {{allData}}
) inside the {{FileSystem.Statistics}} objects. Before this change, it would get cleaned up
when the owner thread gets garbage collected *and* a read operation is done on the {{Statistics}}
object. By read operations I mean methods such as {{getBytesRead()}} and so on.

After this change, the timing of this clean-up no longer depends on the read operations, and
it will be done promptly when the thread is garbage collected. So in a sense, the change first
ensures there is clean-up no matter what, and also moves up the timing of the clean-up.

The worst-case scenario in which this can have a negative impact on performance is if the
use case *never* reads the statistics. Prior to the change, as long as the heap can contain
these objects, no clean-up will be done. With the change now we do perform additional clean-up
on threads garbage collection.

A subsequent observation is that the impact of the clean-up is greater if there is a *high
degree of thread churn* within the JVM. If we're talking about only a handful of threads or
long-lived threads, there should really be no difference.

I would greatly appreciate it if you could dig a little deeper via logging or low overhead
profiling to pinpoint the correlation. Thanks.

> Performance degradation caused by MAPREDUCE-5465 and HADOOP-12107
> -----------------------------------------------------------------
>                 Key: MAPREDUCE-6735
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6735
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Alexandr Balitsky
> Two commits, MAPREDUCE-5465 and HADOOP-12107 are making Terasort on YARN 10% slower.
> Reduce phase with those commits ~5 mins
> Reduce phase without ~3.5 mins
> Average Reduce is taking 4mins, 16sec with those commits compared to 3mins, 48sec without.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message