hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Work logged] (HADOOP-17222) Create socket address leveraging URI cache
Date Sat, 12 Sep 2020 20:48:01 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-17222?focusedWorklogId=483027&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-483027
]

ASF GitHub Bot logged work on HADOOP-17222:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Sep/20 20:47
            Start Date: 12/Sep/20 20:47
    Worklog Time Spent: 10m 
      Work Description: liuml07 merged pull request #2241:
URL: https://github.com/apache/hadoop/pull/2241






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 483027)
    Time Spent: 3h 40m  (was: 3.5h)

>  Create socket address leveraging URI cache
> -------------------------------------------
>
>                 Key: HADOOP-17222
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17222
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: common, hdfs-client
>         Environment: HBase version: 2.1.0
> JVM: -Xmx2g -Xms2g 
> hadoop hdfs version: 2.7.4
> disk:SSD
> OS:CentOS Linux release 7.4.1708 (Core)
> JMH Benchmark: @Fork(value = 1) 
> @Warmup(iterations = 300) 
> @Measurement(iterations = 300)
>            Reporter: fanrui
>            Assignee: fanrui
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0
>
>         Attachments: After Optimization remark.png, After optimization.svg, Before Optimization
remark.png, Before optimization.svg
>
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Note:Not only the hdfs client can get the current benefit, all callers of NetUtils.createSocketAddr
will get the benefit. Just use hdfs client as an example.
>  
> Hdfs client selects best DN for hdfs Block. method call stack:
> DFSInputStream.chooseDataNode -> getBestNodeDNAddrPair -> NetUtils.createSocketAddr
> NetUtils.createSocketAddr creates the corresponding InetSocketAddress based on the host
and port. There are some heavier operations in the NetUtils.createSocketAddr method, for example:
URI.create(target), so NetUtils.createSocketAddr takes more time to execute.
> The following is my performance report. The report is based on HBase calling hdfs. HBase
is a high-frequency access client for hdfs, because HBase read operations often access a small
DataBlock (about 64k) instead of the entire HFile. In the case of high frequency access, the
NetUtils.createSocketAddr method is time-consuming.
> h3. Test Environment:
>  
> {code:java}
> HBase version: 2.1.0
> JVM: -Xmx2g -Xms2g 
> hadoop hdfs version: 2.7.4
> disk:SSD
> OS:CentOS Linux release 7.4.1708 (Core)
> JMH Benchmark: @Fork(value = 1) 
> @Warmup(iterations = 300) 
> @Measurement(iterations = 300)
> {code}
> h4. Before Optimization FlameGraph:
> In the figure, we can see that DFSInputStream.getBestNodeDNAddrPair accounts for 4.86%
of the entire CPU, and the creation of URIs accounts for a larger proportion.
> !Before Optimization remark.png!
> h3. Optimization ideas:
> NetUtils.createSocketAddr creates InetSocketAddress based on host and port. Here we can
add Cache to InetSocketAddress. The key of Cache is host and port, and the value is InetSocketAddress.
> h4. After Optimization FlameGraph:
> In the figure, we can see that DFSInputStream.getBestNodeDNAddrPair accounts for 0.54%
of the entire CPU. Here, ConcurrentHashMap is used as the Cache, and the ConcurrentHashMap.get()
method gets data from the Cache. The CPU usage of DFSInputStream.getBestNodeDNAddrPair has
been optimized from 4.86% to 0.54%.
> !After Optimization remark.png!
> h3. Original FlameGraph link:
> [Before Optimization|https://drive.google.com/file/d/133L5m75u2tu_KgKfGHZLEUzGR0XAfUl6/view?usp=sharing]
> [After Optimization FlameGraph|https://drive.google.com/file/d/133L5m75u2tu_KgKfGHZLEUzGR0XAfUl6/view?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message