hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vikas Vishwakarma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17877) Replace/improve HBase's byte[] comparator
Date Mon, 10 Apr 2017 05:45:41 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962461#comment-15962461
] 

Vikas Vishwakarma commented on HBASE-17877:
-------------------------------------------

[~larsh] here are the updated results. 

ok sorry .. figured out a issue with my JMH tests .. but after fixing it the results are very
encouraging. Initially I was using static byte arrays for comparison which i guess was getting
internally optimized. 

This was the old JMH benchmark code
{code:title=OldBenchmarkCode.java|borderStyle=solid}
	ba1_8 = new byte[8];
	ba2_8 = new byte[8];
	r.nextBytes(ba1_8);
        r.nextBytes(ba2_8);

    	compareToHadoop(ba1_8, 0, ba1_8.length, ba2_8, 0, ba2_8.length); {
             //hadoop comparator code
        }

    	compareToHBase(ba1_8, 0, ba1_8.length, ba2_8, 0, ba2_8.length); {
             //hbase comparator code
        }
{code}

To avoid any optimizations I changed it as follows by randomly changing one of the bytes in
the byte arrays in the new JMH benchmark code
{code:title=NewBenchmarkCode.java|borderStyle=solid}
	ba1_8 = new byte[8];
	ba2_8 = new byte[8];
	r.nextBytes(ba1_8);
        r.nextBytes(ba2_8);

    	compareToHadoop(ba1_8, 0, ba1_8.length, ba2_8, 0, ba2_8.length); {
		final int minLength = Math.min(length1, length2);
		int indx = r.nextInt(minLength);
		buffer1[indx] = (byte) 43;
		buffer2[indx] = (byte) 43;

               //hadoop comparator code
        }

    	compareToHBase(ba1_8, 0, ba1_8.length, ba2_8, 0, ba2_8.length); {
		final int minLength = Math.min(length1, length2);
		int indx = r.nextInt(minLength);
		buffer1[indx] = (byte) 43;
		buffer2[indx] = (byte) 43;

             //hbase comparator code
        }
{code}

With the above changes I ran with 20 Warmup cycles and 100 iterations of 1 second each for
each array size (so the test duration per comparator is around 30 mins) and now we can clearly
see that as the byte array size increases the throughput in ops/ms reduces and the results
are as follows where we see very good improvement with hadoop comparator vs HBase comparator
(except for few cases). 
Iteration#1
|----|HBase|----|----|Hadoop|----|----|%diff|----|----|
|byte array diff index|min|mean|max|min|mean|max|min|mean|max|
|4|36948.957|37047.507|37063.599|43624.207|43720.104|43736.301|18|18|18|
|8|27884.837|34081.159|34173.034|39546.43|39653.132|39683.029|42|16|16|
|16|32994.729|33606.42|33643.392|38950.12|39033.963|39050.588|18|16|16|
|20|31131.95|31262.936|31427.434|27721.608|27900.273|27934.124|-11|-11|-11|
|32|31564.556|31713.3|31729.588|36641.596|36875.77|36908.993|16|16|16|
|50|25651.127|25704.675|25720.617|21985.286|22783.331|23810.156|-14|-11|-7|
|64|23990.409|25744.616|25817.746|22774.009|22907.009|23040.051|-5|-11|-11|
|100|19559.995|19733.446|19766.259|17116.267|18049.88|19421.504|-12|-9|-2|
|128|20541.274|20564.717|20571.537|27311.353|27444.572|27467.086|33|33|34|
|200|14356.162|14376.86|14384.074|17341.848|17946.231|18587.39|21|25|29|
|256|13319.756|13615.766|13648.414|18262.812|18328.989|18337.549|37|35|34|
|512|8022.747|8053.372|8057.757|12494.631|12560.197|12569.778|56|56|56|
|1024|4368.514|4387.346|4390.766|7049.335|7144.239|7152.564|61|63|63|
|2048|2312.296|2316.975|2318.876|3735.84|3746.904|3748.395|62|62|62|
|4096|963.396|1173.651|1177.635|1854.35|1992.96|1998.702|92|70|70|
|8192|557.483|568.487|568.982|1021.296|1028.422|1029.441|83|81|81|
|16384|270.662|300.638|301.418|512.884|515.227|515.692|89|71|71|

Iteration#2
|----|HBase|----|----|Hadoop|----|----|%diff|----|----|
|byte array diff index|min|mean|max|min|mean|max|min|mean|max|
|4|35456.243|37025.448|37064.285|43049.677|43680.577|43737.106|21|18|18|
|8|24971.846|33830.522|34169.057|38968.528|39633.455|39725.308|56|17|16|
|16|32733.421|32867.514|32887.865|38875.54|39031.123|39054.413|19|19|19|
|20|29543.281|31638.401|31887.656|27356.015|27902.406|27937.292|-7|-12|-12|
|32|31567.346|31707.575|31730.414|36795.993|36874.896|36905.325|17|16|16|
|50|25178.46|25716.801|25737.88|23123.396|23842.244|23954.188|-8|-7|-7|
|64|25232.908|25769.57|25790.104|23816.636|23926.496|23993.18|-6|-7|-7|
|100|18537.318|19401.317|19450.866|16679.562|17068.408|17110.839|-10|-12|-12|
|128|20516.499|20561.503|20570.657|25857.553|27023.148|27048.004|26|31|31|
|200|14368.006|14387.637|14399.758|16416.011|16620.799|16714.22|14|16|16|
|256|12164.544|13614.59|13646.761|19741.812|19873.067|19881.449|62|46|46|
|512|7949.774|7988.21|8000.377|12369.546|12555.549|12569.001|56|57|57|
|1024|4360.064|4388.154|4391.112|7058.351|7145.961|7152.739|62|63|63|
|2048|2282.581|2315.234|2318.591|3581.612|3741.999|3748.724|57|62|62|
|4096|1129.561|1141.206|1146.345|2013.874|2016.491|2017.712|78|77|76|
|8192|594.281|599.508|600.069|1022.809|1028.663|1029.265|72|72|72|
|16384|299.974|300.753|301.198|499.003|515.035|515.806|66|71|71|

> Replace/improve HBase's byte[] comparator
> -----------------------------------------
>
>                 Key: HBASE-17877
>                 URL: https://issues.apache.org/jira/browse/HBASE-17877
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Vikas Vishwakarma
>         Attachments: 17877-1.2.patch, 17877-v2-1.3.patch, ByteComparatorJiraHBASE-17877.pdf
>
>
> [~vik.karma] did some extensive tests and found that Hadoop's version is faster - dramatically
faster in some cases.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message