hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Bray (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8762) Performance/operational penalty when calling HTable.get with a list of one Get
Date Tue, 18 Jun 2013 22:57:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687333#comment-13687333

Jason Bray commented on HBASE-8762:

Thanks for taking a look [~saint.ack@gmail.com].  

Regarding the javadoc of HTableInterface: I'm happy to change it, but by bringing it up in
this jira I may have confused the issue.  To be clear, the javadoc seems incorrect with or
without the patch I've provided.  The RetriesExhaustedWithDetailsException will be thrown
and ascend the stack preventing the Result[] from being returned at all.  The javadoc change
would simply be to have the statement read "If there are any failures even after retries an
exception will be thrown.".

If we're in agreement, I can create a second jira with just that javadoc change or amend this
patch to include both changes - whichever is most appropriate.
> Performance/operational penalty when calling HTable.get with a list of one Get
> ------------------------------------------------------------------------------
>                 Key: HBASE-8762
>                 URL: https://issues.apache.org/jira/browse/HBASE-8762
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>            Reporter: Jason Bray
>            Priority: Minor
>         Attachments: HBASE-8762.patch
> There are two implications to calling HTable.get with a list of one Get.
> 1. The overhead of processBatch is paid unnecessarily, which is not insignificant.
> 2. The get requests show up as a 'multi' when reviewing RPC handlers, when the request
should just be a single Get.  It seems likely that there are other places in logs/ui it shows
up as a multi as well.
> To give some context to the overhead, here are some timings performed by a member of
our team:
> In a very simple test, of reading the same key 100 times, taking the time it took, and
then repeating this 10 times (1000 total gets), the times are as follows (excluding the actual
first iteration as there was considerable HBase warm-up times on the JVM for establishing
> ||Iteration||Batch (in ms)||Single (in ms)||
> |1|2255|815| 
> |2|1545|823| 
> |3|1427|742| 
> |4|1451|721| 
> |5|1480|775| 
> |6|1379|735| 
> |7|1657|775| 
> |8|1392|804|
> While I can see the argument that callers should use the single Get method signature,
the cost implications are somewhat surprising and it's very easy to be smart in this case.
 We simply need to have HTable.get(List<Get>) delegate to HTable.get(<Get>) if
the list has one Get.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message