hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Liochon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background
Date Fri, 17 May 2013 21:59:17 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661080#comment-13661080

Nicolas Liochon commented on HBASE-6295:

bq. HConnectionImplementation which are not in HConnection
The problem i have here is the link between HTable and HConnectionImplementation. I don't
have a nice solution.

bq. I think including tableName and row.getRow() in exception message would help debug.
I've done it for tableName but not for getRow as it would be quite big sometimes.

bq. Also, there's still large scale (hundreds of lines) copy-pasted code shared between AsyncProcess
and Process. If we don't get rid of Process fast (and I suspect realistically we won't) it
can become a problem. Can at least some shared code be made shared?
That's the big one. 'Process' is not a public class. I tried to reimplement the functions
that use it with the Async process. The tests are not yet fine locally. I will push to RB
once it's ok.

bq. Is it a legal condition?
It's historical. It means that someone could send a list with some nulls in the middle. I
preferred to keep it.

bq. Since getWriteBuffer is removed and there's no way to get at this buffer.
I removed it because it was not in HTableInterface and it was an implementation leak. This
said, everybody uses HTable directly. I put it back.

bq. Code in HTable looks very non-thread-safe, I am assuming that is ok.
yes, HTable is non threadsafe by design. The idea is to have no lock at all in this class
(but I had to put some in AsyncProcess as there is some mt stuff because of the callbacks).

> Possible performance improvement in client batch operations: presplit and send in background
> --------------------------------------------------------------------------------------------
>                 Key: HBASE-6295
>                 URL: https://issues.apache.org/jira/browse/HBASE-6295
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client, Performance
>    Affects Versions: 0.95.2
>            Reporter: Nicolas Liochon
>            Assignee: Nicolas Liochon
>              Labels: noob
>         Attachments: 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch,
> today batch algo is:
> {noformat}
> for Operation o: List<Op>{
>   add o to todolist
>   if todolist > maxsize or o last in list
>     split todolist per location
>     send split lists to region servers
>     clear todolist
>     wait
> }
> {noformat}
> We could:
> - create immediately the final object instead of an intermediate array
> - split per location immediately
> - instead of sending when the list as a whole is full, send it when there is enough data
for a single location
> It would be:
> {noformat}
> for Operation o: List<Op>{
>   get location
>   add o to todo location.todolist
>   if (location.todolist > maxLocationSize)
>     send location.todolist to region server 
>     clear location.todolist
>     // don't wait, continue the loop
> }
> send remaining
> wait
> {noformat}
> It's not trivial to write if you add error management: retried list must be shared with
the operations added in the todolist. But it's doable.
> It's interesting mainly for 'big' writes

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message