hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "nkeywal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5924) In the client code, don't wait for all the requests to be executed before resubmitting a request in error.
Date Thu, 07 Jun 2012 09:21:23 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13290899#comment-13290899

nkeywal commented on HBASE-5924:


bq. 'origin' -> 'original', 'what are the actions to replay' -> 'what actions to replay'

bq. InterruptedIOException should be thrown.

bq. The above is hard to read. A period between 'records' and 'results' ? A period between
'list' and 'walk' ?
It was already there previously :-). But I aggree, it's better with some periods or dots.

bq. hbase-server/src/main/java/org/apache/hadoop/hbase/util/Triple.java was not included in
patch v9.


bq. Did you add the history in the first place? Why is it safe to remove it now?
In the previous code we were updating the locations cache multiple times for the same row,
and the second time without the RegionMovedException. So it was necessary to store that we
had already taken the error into account for this row... We now update the locations cache
only once, so we don't need to store the history anymore.

bq. On your three comments above, on 1., on the unused code, it may not be triggered by the
test suite – that could just be bad test coverage – but independent, there may have been
a reason for it. If your review of processBatchCallback has it making no sense, by all means
purge it (as you have done).

Yep, for this one removing it allows to simplify the algorithm as I can find the original

bq. On 2., the callback, it looks like you kept it. I think that sensible. On 3., can we move
it to HTable? Deprecate the current version in favor of the new HTable/HTableInterface version?
Would that be too disruptive?

We can keep the existing interface, deprecate it, and add the new one in HTable, making it
call the old one.
Then in the future remove if from HConnection and move the code in HTable.

I've done it in v10.

bq. Any way you can add tests to prove your claims of improvement above (its hard to review
for that...

It's hard. Testing that we restart immediately instead of waiting for all results is difficult
without adding sleeps and/or mocking a lot of things, because it's not visible at all outside
of the method: its interface has not changed, just the internal algorithm.

Functionally, it's tested through testRegionCaching (with some extra checks in it in this
patch), and it proves that:
- it works on nominal case (and you can't start the mini cluster when the nominal case does
not work).
- it retries when one RS fails
- it stops to retry when the number of retries is reached, and throws the right exception
with the right content

For the performance improvement on nominal case, unfortunately it does not make a big difference.
It's cleaner, but the tests done show that it's not important vs. the remaining time.

> In the client code, don't wait for all the requests to be executed before resubmitting
a request in error.
> ----------------------------------------------------------------------------------------------------------
>                 Key: HBASE-5924
>                 URL: https://issues.apache.org/jira/browse/HBASE-5924
>             Project: HBase
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>         Attachments: 5924.v5.patch, 5924.v9.patch
> The client (in the function HConnectionManager#processBatchCallback) works in two steps:
>  - make the requests
>  - collect the failures and successes and prepare for retry
> It means that when there is an immediate error (region moved, split, dead server, ...)
we still wait for all the initial requests to be executed before submitting again the failed
request. If we have a scenario with all the requests taking 5 seconds we have a final execution
time of: 5 (initial requests) + 1 (wait time) + 5 (final request) = 11s.
> We could improve this by analyzing immediately the results. This would lead us, for the
scenario mentioned above, to 6 seconds. 
> So we could have a performance improvement of nearly 50% in many cases, and much more
than 50% if the request execution time is different.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message