hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6884) Add LOG.isDebugEnabled() guard for each LOG.debug("...")
Date Sat, 28 Aug 2010 00:32:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903734#action_12903734
] 

Konstantin Shvachko commented on HADOOP-6884:
---------------------------------------------

I think Doug's request to provide benchmarks is reasonable. We should benchmark more.
So I modified {{NNThroughputBenchmark}}, to benchmark {{getFileStatus()}}. In current trunk
{{getFileStatus()}} calls {{Groups.getGroups()}}, which has a LOG.debug() in it. In {{NNThroughputBenchmark}}
you can specify a logLevel, so if I run it with {{logLevel = DEBUG}} the debug message from
{{Groups.getGroups()}} is printed for every {{getFileStatus()}} call, and no other messages
are printed.

I compared two versions of the code with {{logLevel = INFO}}:
# Unmodified {{Groups.getGroups()}}, where the argument for the debug call is calculated as
a sum of strings.
# Modified {{Groups.getGroups()}} with {{if(LOG.isDebugEnabled())}} before the logging statement.

Results:
# 10,000 calls of {{getFileStatus()}} in (1) yield 22,905 ops/sec
 10,000 calls of {{getFileStatus()}} in (2) yield 24,610 ops/sec
 This about 7% difference.
# I increased the namespace to 100,000 files and performed that many calls of {{getFileStatus()}}
for both setups. I still see 3-4% improvement in case (2).
  The gain is less because it is absorbed by the increased cost of navigating down the namespace
tree.
# Did the same with 1,000 files. (2) is better about 12-13%.

{{Groups.getGroups()}} is a very popular method it is a part of user authentication, so this
log message is a part of each and every name-node call. In many cases the cost of concatenating
strings in it will be absorbed by disk io operations, but it is still good to have this improvement.

I'll post the patch for {{NNThroughputBenchmark}} in another jira shortly.

> Add LOG.isDebugEnabled() guard for each LOG.debug("...")
> --------------------------------------------------------
>
>                 Key: HADOOP-6884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6884
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Erik Steffl
>            Assignee: Erik Steffl
>             Fix For: 0.22.0
>
>         Attachments: FunAgain.java, FunAgain.java, HADOOP-6884-0.22-1.patch, HADOOP-6884-0.22.patch
>
>
> Each LOG.debug("...") should be executed only if LOG.isDebugEnabled() is true, in some
cases it's expensive to construct the string that is being printed to log. It's much easier
to always use LOG.isDebugEnabled() because it's easier to check (rather than in each case
reason whether it's necessary or not).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message