hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15569) Make Bytes.toStringBinary faster
Date Fri, 01 Apr 2016 07:50:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221321#comment-15221321
] 

Hudson commented on HBASE-15569:
--------------------------------

FAILURE: Integrated in HBase-Trunk_matrix #820 (See [https://builds.apache.org/job/HBase-Trunk_matrix/820/])
HBASE-15569 Make Bytes.toStringBinary faster (stack: rev ff6a3395821fd1a7857b35b11d45b81743a75e61)
* hbase-common/src/main/java/org/apache/hadoop/hbase/util/Bytes.java


> Make Bytes.toStringBinary faster
> --------------------------------
>
>                 Key: HBASE-15569
>                 URL: https://issues.apache.org/jira/browse/HBASE-15569
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>            Reporter: Junegunn Choi
>            Assignee: Junegunn Choi
>            Priority: Minor
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.2
>
>         Attachments: HBASE-15569.patch
>
>
> Bytes.toStringBinary is quite expensive due to its use of {{String.format}}. It seems
to me that {{String.format}} is overkill for the purpose and I could actually make the function
up to 45-times faster by replacing the part with a simpler hand-crafted code.
> This is probably a non-issue for HBase server as the function is not used in performance-sensitive
contexts but I figured it wouldn't hurt to make it faster as it's widely used in builtin tools
- Shell, {{HFilePrettyPrinter}} with {{-p}} option, etc. - and it can be used in clients.
> h4. Background:
> We have [an HBase monitoring tool|https://github.com/kakao/hbase-region-inspector] that
periodically collects the information of the regions and it calls {{Bytes.toStringBinary}}
during the process to make some information suitable for display. Profiling revealed that
a large portion of the processing time was spent in {{String.format}}.
> h4. Micro-benchmark:
> {code}
> byte[] bytes = new byte[256];
> for (int i = 0; i < bytes.length; ++i) {
>   // Mixture of printable and non-printable characters.
>   // Maximal performance gain (45x) is observed when the array is solely
>   // composed of non-printable characters.
>   bytes[i] = (byte) i;
> }
> long started = System.nanoTime();
> for (int i = 0; i < 1000000; ++i) {
>   Bytes.toStringBinary(bytes);
> }
> System.out.println(TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - started));
> {code}
> - Without the patch: 134176 ms
> - With the patch: 3890 ms
> I made sure that the new version returns the same value as before and simplified the
check for non-printable characters.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message