hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10662) Optimize UTF8 string/byte conversions
Date Tue, 02 Aug 2016 21:01:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15404773#comment-15404773

Kihwal Lee commented on HDFS-10662:

Unfortunately, this just went in and the patch no longer applies.
commit a5fb298e56220a35d61b8d2bda716d8fb8ef8bb7
Author: Akira Ajisaka <aajisaka@apache.org>
Date:   Tue Aug 2 17:07:59 2016 +0900

    HDFS-10707. Replace org.apache.commons.io.Charsets with java.nio.charset.StandardCharsets.
Contributed by Vincent Poon.

> Optimize UTF8 string/byte conversions
> -------------------------------------
>                 Key: HDFS-10662
>                 URL: https://issues.apache.org/jira/browse/HDFS-10662
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: HDFS-10662.patch
> String/byte conversions may take either a Charset instance or its canonical name.  One
might think a Charset instance would be faster due to avoiding a lookup and instantiation
of a Charset, but it's not.  The canonical string name variants will cache the string encoder/decoder
(obtained from a Charset) resulting in better performance.
> LOG4J2-935 describes a real-world performance boost.  I micro-benched a marginal runtime
improvement on jdk 7/8.  However for a 16 byte path, using the canonical name generated 50%
less garbage.  For a 64 byte path, 25% of the garbage.  Given the sheer number of times that
paths are (re)parsed, the cost adds up quickly.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message