hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Marc Spaggiari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
Date Mon, 17 Jun 2013 13:49:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685560#comment-13685560
] 

Jean-Marc Spaggiari commented on HBASE-6295:
--------------------------------------------

Other tests seems to be consistent even if I don't get the exact same results... Will do some
more.

bin/hbase org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList Loop 2 1 3000000 /tmp/biglinkedlist
1

Trunk:
2013-06-17 08:37:08,264 INFO  [main] mapred.JobClient: Job complete: job_local_0006
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient: Counters: 31
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:   org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$Counts
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:     REFERENCED=6000000
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:   HBase Counters
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:     REMOTE_RPC_CALLS=0
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:     RPC_CALLS=609
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:     RPC_RETRIES=0
2013-06-17 08:37:08,265 INFO  [main] mapred.JobClient:     NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     NUM_SCANNER_RESTARTS=0
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     MILLIS_BETWEEN_NEXTS=41071
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     BYTES_IN_RESULTS=360000000
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     BYTES_IN_REMOTE_RESULTS=0
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     REGIONS_SCANNED=4
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     REMOTE_RPC_RETRIES=0
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:   File Output Format Counters
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:     Bytes Written=8
2013-06-17 08:37:08,266 INFO  [main] mapred.JobClient:   FileSystemCounters
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:     FILE_BYTES_READ=5696162333
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:     FILE_BYTES_WRITTEN=6730223455
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:   File Input Format Counters
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:     Bytes Read=0
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:   Map-Reduce Framework
2013-06-17 08:37:08,267 INFO  [main] mapred.JobClient:     Map output materialized bytes=414000024
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     Map input records=6000000
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     Reduce shuffle bytes=0
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     Spilled Records=39145720
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     Map output bytes=390000000
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     Total committed heap usage (bytes)=1303552000
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     CPU time spent (ms)=0
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     SPLIT_RAW_BYTES=422
2013-06-17 08:37:08,268 INFO  [main] mapred.JobClient:     Combine input records=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Reduce input records=12000000
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Reduce input groups=6000000
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Combine output records=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Physical memory (bytes) snapshot=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Reduce output records=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Virtual memory (bytes) snapshot=0
2013-06-17 08:37:08,269 INFO  [main] mapred.JobClient:     Map output records=12000000
2013-06-17 08:37:08,271 INFO  [main] test.IntegrationTestBigLinkedList$Loop: Verify finished
with succees. Total nodes=6000000


Nic:
2013-06-17 08:44:47,530 INFO  [main] mapred.JobClient: Job complete: job_local_0006
2013-06-17 08:44:47,531 INFO  [main] mapred.JobClient: Counters: 31
2013-06-17 08:44:47,531 INFO  [main] mapred.JobClient:   org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$Counts
2013-06-17 08:44:47,531 INFO  [main] mapred.JobClient:     REFERENCED=6000000
2013-06-17 08:44:47,531 INFO  [main] mapred.JobClient:   HBase Counters
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     REMOTE_RPC_CALLS=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     RPC_CALLS=607
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     RPC_RETRIES=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     NUM_SCANNER_RESTARTS=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     MILLIS_BETWEEN_NEXTS=39871
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     BYTES_IN_RESULTS=360000000
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     BYTES_IN_REMOTE_RESULTS=0
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     REGIONS_SCANNED=3
2013-06-17 08:44:47,532 INFO  [main] mapred.JobClient:     REMOTE_RPC_RETRIES=0
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:   File Output Format Counters
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:     Bytes Written=8
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:   FileSystemCounters
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:     FILE_BYTES_READ=5185648641
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:     FILE_BYTES_WRITTEN=6110147770
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:   File Input Format Counters
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:     Bytes Read=0
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:   Map-Reduce Framework
2013-06-17 08:44:47,533 INFO  [main] mapred.JobClient:     Map output materialized bytes=414000018
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Map input records=6000000
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Reduce shuffle bytes=0
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Spilled Records=41455689
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Map output bytes=390000000
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Total committed heap usage (bytes)=1262878720
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     CPU time spent (ms)=0
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     SPLIT_RAW_BYTES=302
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Combine input records=0
2013-06-17 08:44:47,534 INFO  [main] mapred.JobClient:     Reduce input records=12000000
2013-06-17 08:44:47,535 INFO  [main] mapred.JobClient:     Reduce input groups=6000000
2013-06-17 08:44:47,535 INFO  [main] mapred.JobClient:     Combine output records=0
2013-06-17 08:44:47,535 INFO  [main] mapred.JobClient:     Physical memory (bytes) snapshot=0
2013-06-17 08:44:47,535 INFO  [main] mapred.JobClient:     Reduce output records=0
2013-06-17 08:44:47,535 INFO  [main] mapred.JobClient:     Virtual memory (bytes) snapshot=0
2013-06-17 08:44:47,535 INFO  [main] mapred.JobClient:     Map output records=12000000
2013-06-17 08:44:47,536 INFO  [main] test.IntegrationTestBigLinkedList$Loop: Verify finished
with succees. Total nodes=6000000







bin/hbase org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify -Dloadmapper.backrefs=10
-Dloadmapper.map.tasks=10 -Dloadmapper.num_to_write=100000 -Dverify.reduce.tasks=1 -Dverify.scannercaching=10000
loadAndVerify



Trunk:
2013-06-17 09:01:45,884 INFO  [main] mapred.JobClient: Job complete: job_local_0002
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient: Counters: 32
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:   HBase Counters
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     REMOTE_RPC_CALLS=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     RPC_CALLS=196
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     RPC_RETRIES=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     NUM_SCANNER_RESTARTS=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     MILLIS_BETWEEN_NEXTS=19795
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     BYTES_IN_RESULTS=592892544
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     BYTES_IN_REMOTE_RESULTS=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     REGIONS_SCANNED=40
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     REMOTE_RPC_RETRIES=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:   org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     ROWS_WRITTEN=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     REFERENCES_CHECKED=9855224
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:   File Output Format Counters
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Bytes Written=8
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:   FileSystemCounters
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     FILE_BYTES_READ=12005531128
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     FILE_BYTES_WRITTEN=21471152830
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:   File Input Format Counters
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Bytes Read=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:   Map-Reduce Framework
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Map output materialized bytes=460630096
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Map input records=1000000
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Reduce shuffle bytes=0
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Spilled Records=42262109
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Map output bytes=438919408
2013-06-17 09:01:45,885 INFO  [main] mapred.JobClient:     Total committed heap usage (bytes)=15392387072
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     CPU time spent (ms)=0
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     SPLIT_RAW_BYTES=4144
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Combine input records=0
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Reduce input records=10855224
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Reduce input groups=1000000
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Combine output records=0
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Physical memory (bytes) snapshot=0
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Reduce output records=0
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Virtual memory (bytes) snapshot=0
2013-06-17 09:01:45,886 INFO  [main] mapred.JobClient:     Map output records=10855224




Nic:
2013-06-17 08:56:38,894 INFO  [main] mapred.JobClient: Job complete: job_local_0002
2013-06-17 08:56:38,895 INFO  [main] mapred.JobClient: Counters: 32
2013-06-17 08:56:38,895 INFO  [main] mapred.JobClient:   HBase Counters
2013-06-17 08:56:38,895 INFO  [main] mapred.JobClient:     REMOTE_RPC_CALLS=0
2013-06-17 08:56:38,895 INFO  [main] mapred.JobClient:     RPC_CALLS=196
2013-06-17 08:56:38,895 INFO  [main] mapred.JobClient:     RPC_RETRIES=0
2013-06-17 08:56:38,895 INFO  [main] mapred.JobClient:     NOT_SERVING_REGION_EXCEPTION=0
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     NUM_SCANNER_RESTARTS=0
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     MILLIS_BETWEEN_NEXTS=19384
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     BYTES_IN_RESULTS=592944120
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     BYTES_IN_REMOTE_RESULTS=0
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     REGIONS_SCANNED=40
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     REMOTE_RPC_RETRIES=0
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:   org.apache.hadoop.hbase.test.IntegrationTestLoadAndVerify$Counters
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     ROWS_WRITTEN=0
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     REFERENCES_CHECKED=9856145
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:   File Output Format Counters
2013-06-17 08:56:38,896 INFO  [main] mapred.JobClient:     Bytes Written=8
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:   FileSystemCounters
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     FILE_BYTES_READ=12006648901
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     FILE_BYTES_WRITTEN=21472928417
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:   File Input Format Counters
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     Bytes Read=0
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:   Map-Reduce Framework
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     Map output materialized bytes=460670620
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     Map input records=1000000
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     Reduce shuffle bytes=0
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     Spilled Records=42265579
2013-06-17 08:56:38,897 INFO  [main] mapred.JobClient:     Map output bytes=438958090
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Total committed heap usage (bytes)=15534960640
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     CPU time spent (ms)=0
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     SPLIT_RAW_BYTES=4144
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Combine input records=0
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Reduce input records=10856145
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Reduce input groups=1000000
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Combine output records=0
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Physical memory (bytes) snapshot=0
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Reduce output records=0
2013-06-17 08:56:38,898 INFO  [main] mapred.JobClient:     Virtual memory (bytes) snapshot=0
2013-06-17 08:56:38,899 INFO  [main] mapred.JobClient:     Map output records=10856145



                
> Possible performance improvement in client batch operations: presplit and send in background
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6295
>                 URL: https://issues.apache.org/jira/browse/HBASE-6295
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, Performance
>    Affects Versions: 0.95.2
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>              Labels: noob
>             Fix For: 0.98.0
>
>         Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch,
6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List<Op>{
>   add o to todolist
>   if todolist > maxsize or o last in list
>     split todolist per location
>     send split lists to region servers
>     clear todolist
>     wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is enough data
for a single location
> It would be:
> {noformat}
> for Operation o: List<Op>{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
>     send location.todolist to region server 
>     clear location.todolist
>     // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be shared with
the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message