hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikas Vishwakarma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17877) Replace/improve HBase's byte[] comparator
Date Tue, 11 Apr 2017 10:39:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964156#comment-15964156
] 

Vikas Vishwakarma commented on HBASE-17877:
-------------------------------------------

I completed few iterations today, I added the guava benchmark also which is the latest version,
with the following changes
1. Have all the comparable benchmarks in JMH as suggested by [~stack]
2. Added BlackHole.consume for all the benchmark results as suggested by [~Apache9] (thanks
again !)
3. Used a slightly optimized random byte generation to make sure it's impact is less on the
benchmarks by using a smaller byte array for random byte selection and replacement in input
arrays 
4. Added the guava benchmarks (master branch) as suggested by [~larsh] above

Observations
While hadoop version was giving better performance, the performance was ~10% lower when byteArrayLength%8
 is not 0, most likely because of the last loop where it iterates over each of the leftover
bytes. This also could be some compiler optimization. If I switch the leftover byte handling
in hadoop comparator similar to HBase, I get exactly inverse result i.e worse when byteArraySize%8
= 0 and better when byteArraySize%8 != 0
However with guava version I am able to see overall better performance compared to both HBase
as well as Hadoop. I had tried the guava version earlier also using native timestamp before/after
kind of measurements and in that case hadoop comparator was giving better results in some
cases. It could again be statistical variations or related to compiler optimizations, etc.
With JMH framework after fixing all the initial issues related to compiler/JIT optimization
like input byte array randomization, adding BlackHole, etc I am seeing consistently better
benchmarks for the guava version for all array sizes including the one's where byteArraySize%8
is zero or non-zero

Looks like the main branch guava version is best performing and replacing that in HBase should
give maximum gains (pending review)
https://github.com/google/guava/blob/master/guava/src/com/google/common/primitives/UnsignedBytes.java#L362

Results:
|---|HBase|---|---|hadoop|---|---|hadoop %diff|---|---|guava|---|---|guava %diff|---|---|
|byte array size|min|mean|max|min|mean|max|min|mean|max|min|mean|max|min|mean|max|
|4|19814.642|20217.647|20250.91|19838.782|20072.437|20090.503|0|-1|-1|24026.12|24284.021|24300.338|21|20|20|
|8|19846.598|19874.477|19881.019|22012.932|22044.713|22051.793|11|11|11|22199.453|22253.173|22261.712|12|12|12|
|16|19400.623|19430.837|19438.378|19606.912|19616.322|19649.318|1|1|1|21995.475|22113.443|22120.836|13|14|14|
|20|18456.241|18493.416|18500.289|16482.859|16705.744|16776.35|-11|-10|-9|18625.111|18660.355|18704.285|1|1|1|
|32|18953.196|18984.412|18992.993|19307.22|19345.122|19352.411|2|2|2|21309.337|21359.051|21377.868|12|13|13|
|50|17444.431|17506.91|17518.791|15864.759|15941.543|15953.412|-9|-9|-9|18468.621|18613.202|18749.651|6|6|7|
|64|17390.097|18046.898|18143.835|20152.624|20379.32|20397.359|16|13|12|21065.113|21116.799|21128.523|21|17|16|
|100|14844.718|14866.353|14889.49|13293.668|13385.7|13403.439|-10|-10|-10|15594.286|15690.369|15796.081|5|6|6|
|128|14183.991|14329.948|14351.016|17016.59|17260.48|17278.799|20|20|20|17668.509|19205.199|19333.922|25|34|35|
|200|11665.597|11732.09|11748.27|11599.469|11733.228|11755.622|-1|0|0|14540.79|14648.077|14728.363|25|25|25|
|256|10404.438|10438.019|10444.734|13205.591|13315.903|13326.772|27|28|28|14448.858|14933.008|15064.242|39|43|44|
|512|6405.106|6592.613|6604.371|9031.652|9142.564|9149.54|41|39|39|10236.501|10376.17|10389.971|60|57|57|
|1024|3812.341|3832.237|3840.291|3863.105|3864.757|3871.94|1|1|1|6911.951|7002.067|7009.792|81|83|83|
|2048|2052.148|2060.585|2061.935|2129.32|2151.807|2155.381|4|4|5|4072.481|4085.278|4089.185|98|98|98|
|4096|1073.263|1089.947|1091.566|1069.962|1076.303|1076.993|0|-1|-1|2319.74|2326.514|2328.69|116|113|113|
|8192|544.723|547.063|547.449|863.716|866.808|867.296|59|58|58|931.945|1131.406|1136.288|71|107|108|
|16384|275.155|275.724|275.909|432.556|434.158|434.698|57|57|58|582.37|584.294|584.852|112|112|112|

Apologies for the multiple iterations, I am myself figuring out a lot while doing these microbenchmark
iterations and there are multiple dimensions to track in the test at different levels

> Replace/improve HBase's byte[] comparator
> -----------------------------------------
>
>                 Key: HBASE-17877
>                 URL: https://issues.apache.org/jira/browse/HBASE-17877
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Vikas Vishwakarma
>         Attachments: 17877-1.2.patch, 17877-v2-1.3.patch, ByteComparatorJiraHBASE-17877.pdf
>
>
> [~vik.karma] did some extensive tests and found that Hadoop's version is faster - dramatically
faster in some cases.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message