hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Junegunn Choi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-15569) Make Bytes.toStringBinary faster
Date Thu, 31 Mar 2016 04:31:25 GMT
Junegunn Choi created HBASE-15569:

             Summary: Make Bytes.toStringBinary faster
                 Key: HBASE-15569
                 URL: https://issues.apache.org/jira/browse/HBASE-15569
             Project: HBase
          Issue Type: Improvement
          Components: Performance
            Reporter: Junegunn Choi
            Assignee: Junegunn Choi
            Priority: Minor

Bytes.toStringBinary is quite expensive due to its use of {{String.format}}. It seems to me
that {{String.format}} is overkill for the purpose and I could actually make the function
up to 45-times faster by replacing the part with a simpler hand-crafted code.

This is probably a non-issue for HBase server as the function is not used in performance-sensitive
contexts but I figured it wouldn't hurt to make it faster as it's widely used in builtin tools
- Shell, {{HFilePrettyPrinter}} with {{-p}} option, etc. - and it can be used in clients.

h4. Background:

We have [an HBase monitoring tool|https://github.com/kakao/hbase-region-inspector] that periodically
collects the information of the regions and it calls {{Bytes.toStringBinary}} during the process
to make some information suitable for display. Profiling revealed that a large portion of
the processing time was spent in {{String.format}}.

h4. Micro-benchmark:

byte[] bytes = new byte[256];
for (int i = 0; i < bytes.length; ++i) {
  // Mixture of printable and non-printable characters.
  // Maximal performance gain (45x) is observed when the array is solely
  // composed of non-printable characters.
  bytes[i] = (byte) i;
long started = System.nanoTime();
for (int i = 0; i < 1000000; ++i) {
System.out.println(TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - started));

- Without the patch: 134176 ms
- With the patch: 3890 ms

I made sure that the new version returns the same value as before and simplified the check
for non-printable characters.

This message was sent by Atlassian JIRA

View raw message