hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Eagles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14216) Improve Configuration XML Parsing Performance
Date Thu, 23 Mar 2017 22:48:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939367#comment-15939367
] 

Jonathan Eagles commented on HADOOP-14216:
------------------------------------------

Client-Side Performance Tests:

Setup: Essentially run normal user commands and see the performance gains with only the client
hadoop-common.jar replaced with a patch version

*Eyeball test*:
1. _hadoop fs -ls_
{code}
# baseline - ran dozens of times, this is a typical results
$ time hadoop fs -ls /
real	0m2.694s
user	0m6.633s
sys	0m0.303s

# patched version - ran dozens of times, this is a typical result
$ time HADOOP_USER_CLASSPATH_FIRST=true HADOOP_CLASSPATH="./hadoop-common-2.8.1-HADOOP-14216.jar:./stax2-api-3.1.4.jar:./aalto-xml-1.0.0.jar"
hadoop fs -ls /
real	0m2.335s
user	0m4.963s
sys	0m0.292s
{code}
===========================
Result on a real cluster is roughly 300 ms real 1700 ms user faster per hadoop fs -ls command


2. _yarn application -list_
{code}
$ time yarn application -list
real	0m1.867s
user	0m5.178s
sys	0m0.288s

$ time YARN_USER_CLASSPATH="./hadoop-common-2.8.1-HADOOP-14216.jar:./stax2-api-3.1.4.jar:./aalto-xml-1.0.0.jar"
YARN_USER_CLASSPATH_FIRST=true yarn application -list
real	0m1.607s
user	0m3.911s
sys	0m0.225s
{code}

===========================
Result on a real cluster is roughly 250ms real and 1200 user faster per yarn application -list
command

*Performance Numbers at scale*
{code:title=ConfPerf.java}
import org.apache.hadoop.conf.Configuration;

public class ConfPerf {
  public static void main(String[] args) throws Exception {
    long start = System.currentTimeMillis();
    long count = 0;
    Configuration.addDefaultResource("core-default.xml");
    Configuration.addDefaultResource("core-site.xml");
    Configuration.addDefaultResource("yarn-default.xml");
    Configuration.addDefaultResource("yarn-site.xml");
    Configuration.addDefaultResource("mapred-default.xml");
    Configuration.addDefaultResource("mapred-site.xml");
    Configuration.addDefaultResource("hdfs-default.xml");
    Configuration.addDefaultResource("hdfs-site.xml");
    for (int i = 0; i < 3000; i++) {
      Configuration conf = new Configuration();
      conf.get("trigger.loading");
      count += conf.size();
    }
    long end = System.currentTimeMillis();
    System.out.println("duration: " + (end - start) + " count: " + count);
  }
}
{code}

{code}
# setup performance tests
$ javac -cp ./:`hadoop classpath` ConfPerf.java

# baseline performance numbers
$ time java -cp ./:`hadoop classpath` ConfPerf
real	0m52.456s
user	1m2.209s
sys	0m3.601s

# performance numbers with patch
$ time java -cp ./:./hadoop-common-2.8.1-HADOOP-14216.jar:./stax2-api-3.1.4.jar:./aalto-xml-1.0.0.jar:`hadoop
classpath` ConfPerf
real	0m23.108s
user	0m27.434s
sys	0m1.816s
{code}

===========================
Result in a real cluster are roughly 29300 ms real and 34800 ms user faster 

*Equality Test*
{code:title=ConfEquality.java}
import org.apache.hadoop.conf.Configuration;

public class ConfEquality {
  public static void main(String[] args) throws Exception {
    Configuration.addDefaultResource("core-default.xml");
    Configuration.addDefaultResource("core-site.xml");
    Configuration.addDefaultResource("yarn-default.xml");
    Configuration.addDefaultResource("yarn-site.xml");
    Configuration.addDefaultResource("mapred-default.xml");
    Configuration.addDefaultResource("mapred-site.xml");
    Configuration.addDefaultResource("hdfs-default.xml");
    Configuration.addDefaultResource("hdfs-site.xml");
    Configuration conf = new Configuration();
    conf.get("trigger.loading");
    conf.writeXml(System.out);
  }
}
{code}
{code}
# prepare the equality test
$ javac -cp ./:`hadoop classpath` ConfEquality.java
# run the equality test
$ diff <(java -cp ./:`hadoop classpath` ConfEquality) <(java -cp ./:./hadoop-common-2.8.1-HADOOP-14216.jar:./stax2-api-3.1.4.jar:./aalto-xml-1.0.0.jar:`hadoop
classpath` ConfEquality)
{code}

> Improve Configuration XML Parsing Performance
> ---------------------------------------------
>
>                 Key: HADOOP-14216
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14216
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: HADOOP-14216.1.patch
>
>
> JIRA is to improve XML parsing performance through reuse and a change in XML parser (STAX)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message