hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Marc Spaggiari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
Date Thu, 20 Jun 2013 20:02:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13689578#comment-13689578
] 

Jean-Marc Spaggiari commented on HBASE-6295:
--------------------------------------------

It's time for x lines, depending of the tests it's not the same number of lines.
For RandomReadTest you need to divide by 1048576
For RandomScanWithRange100Test you need to divide by 4096
For RandomSeekScanTest you need to divide by 40960.
For RandomWriteTest you need to divide by 1048576
For SequentialWriteTest you need to divide by 1048576

This is the number of lines per ms. So multiply by 1000 to have the same result. Some are
rows/minutes, so just adjust that.

So if you want to compare, here are the numbers in the same format as te PDF that I usually
produce:
||Test||Trunk||Nic||0.95||
|org,apache,hadoop,hbase,PerformanceEvaluation$RandomReadTest|1377.08|1420.14|1390.50|
|org,apache,hadoop,hbase,PerformanceEvaluation$RandomScanWithRange100Test|11243.12|10992.68|10971.09|
|org,apache,hadoop,hbase,PerformanceEvaluation$RandomSeekScanTest|304.66|296.43|305.25|
|org,apache,hadoop,hbase,PerformanceEvaluation$RandomWriteTest|9176.07|13619.59|9134.09|
|org,apache,hadoop,hbase,PerformanceEvaluation$SequentialWriteTest|13592.40|42655.52|13255.12|

I already noticed the RandomWriteTest impact compared to 0.94 branch and 0.95...

I will re-run the 0.94 tests to make sure, but overall, I really think 0.95 is not doing as
good as 0.95 for the RandomWriteTest.
                
> Possible performance improvement in client batch operations: presplit and send in background
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6295
>                 URL: https://issues.apache.org/jira/browse/HBASE-6295
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, Performance
>    Affects Versions: 0.95.2
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>              Labels: noob
>             Fix For: 0.98.0
>
>         Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 6295.v1.patch, 6295.v2.patch,
6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch
>
>
> today batch algo is:
> {noformat}
> for Operation o: List<Op>{
>   add o to todolist
>   if todolist > maxsize or o last in list
>     split todolist per location
>     send split lists to region servers
>     clear todolist
>     wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is enough data
for a single location
> It would be:
> {noformat}
> for Operation o: List<Op>{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
>     send location.todolist to region server 
>     clear location.todolist
>     // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be shared with
the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message