hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-10662) Optimize UTF8 string/byte conversions
Date Wed, 20 Jul 2016 18:15:20 GMT
Daryn Sharp created HDFS-10662:

             Summary: Optimize UTF8 string/byte conversions
                 Key: HDFS-10662
                 URL: https://issues.apache.org/jira/browse/HDFS-10662
             Project: Hadoop HDFS
          Issue Type: Sub-task
          Components: hdfs
            Reporter: Daryn Sharp
            Assignee: Daryn Sharp

String/byte conversions may take either a Charset instance or its canonical name.  One might
think a Charset instance would be faster due to avoiding a lookup and instantiation of a Charset,
but it's not.  The canonical string name variants will cache the string encoder/decoder (obtained
from a Charset) resulting in better performance.

LOG4J2-935 describes a real-world performance boost.  I micro-benched a marginal runtime improvement
on jdk 7/8.  However for a 16 byte path, using the canonical name generated 50% less garbage.
 For a 64 byte path, 25% of the garbage.  Given the sheer number of times that paths are (re)parsed,
the cost adds up quickly.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

View raw message