hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10277) refactor AsyncProcess
Date Mon, 20 Jan 2014 19:29:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13876760#comment-13876760

Sergey Shelukhin commented on HBASE-10277:

bq. Could AsyncRequests be done as a 'Future'? Seems to have a bunch in common.
I'll take a look at it... it's more like a multi-future. Maybe even FutureTask can be used.

bq. We have something called Set but it does not implement java.util.Set:
Will fix.

bq. We make an ap and a multiap each time?
Once per HTable. The difference is the mode of operation - the "legacy" mode or the normal

bq. Batch should return an array of objects?
It does? Don't quite understand the comment. Some of the existing interfaces accept the array
for results that fill it. Backward compat.

bq. 1) Some changes are cosmetics: some protected becomes private, some "this." are removed.
I'm not against these changes, but it makes the real meat more difficult to find.
Removing "this." is not cosmetic (at least in the cases I am aware of) - methods moved to
a non-static nested class, there's no more "this.".
Changing to private can be removed, although it's a good change to have.

bq. 2) The javadoc has not been updated, so when the code differs from the javadoc, the reader
has to sort out himself if it's just that the javadoc is now out dated or if there is a regression.

Will update.

bq. AsyncProcess#submit. Why does it take a tableName? Does it mean that an AsyncProcess can
now be shared between Tables?

bq. AsyncRequestSet#waitUntilDone
bq. Same responsibility as AsyncProcess#waitUntilDone, but less features (no logs. These logs
are useful).
Some logs were preserved. 
Previous waitUntilDone had two wait conditions and loops in an effort not to loop often, whereas
only one seems to be necessary, so I don't think features were lost... I can add more logs.

bq. This part should go in HConnection, as we should manage the load tracking globally and
not only for a single call. Iit would be a change in bahavior compared to the 0.94, but I
think we should do it. Would it make you life easier here?
It may, actually... but HTable uses AP directly. In fact batch calls from HCM currently don't
use throttling at all (it calls "submitAll"), so only HTable-direct-usage uses this code.

bq. Probably this perf difference will not be noticeable on real requests (remains to be tested).
Let me be more pessimistic here .
Yeah, I'd probably need to test. I was not able to figure out what exactly is the cause of
slowdown - YourKit gives very low % numbers for AP, CPU or wall clock, and almost identical
between old and equivalent new methods across runs.

bq. Also got rid of callback that was mostly used for tests, tests can check results without
I'm not a big fan of this part of the change. Callbacks can be reused in different context
(for example to have a different policy, such as ignoring errors as in HTableMultiplexer).
As well, we now have a hardRetryLimit, but this attribute is used only in tests.
Callback was already only used for tests, for practical purposes (also for array filling,
but that is no longer necessary).
I am not a big fan of having test-oriented code in production code; thus I replaced the only
necessary test usage (stopping retries) with a field. I wanted to get rid of that too, but
that would be too convoluted it seems.
When we have a scenario to use some callback, we can add it, under YAGNI principle :)

bq. More globally, This patch allows to reuse a single AsyncProcess between independant streams
of writes. Would that be necessary if it was cheaper to create ? The cost is reading the configuration,
as when we do a HTable#get and create a RegionServerCallable.
The main reason is to have well-defined context for single call for "normal" (as opposed to
HTable cross-put stuff) usage pattern. So for example replica calls could group requests differently
and synchronize/populate the same result, cancel other calls when they are no longer useful,
It also separates the two patterns better (by ctor argument); see potential problems outlined
in JIRA description.

bq. The problem is that with this patch, we still create a AsyncProcess in some cases, for
example on the batchCallback path...
Yeah, that is due to ability to pass a custom execution pool (could be changed to accomodate
it by making pool per request, but I am not sure it's necessary...) and limitations of java
generics + current result type handling (AP will have to be per result type). IIRC most of
these paths are deprecated.

> refactor AsyncProcess
> ---------------------
>                 Key: HBASE-10277
>                 URL: https://issues.apache.org/jira/browse/HBASE-10277
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HBASE-10277.patch
> AsyncProcess currently has two patterns of usage, one from HTable flush w/o callback
and with reuse, and one from HCM/HTable batch call, with callback and w/o reuse. In the former
case (but not the latter), it also does some throttling of actions on initial submit call,
limiting the number of outstanding actions per server.
> The latter case is relatively straightforward. The former appears to be error prone due
to reuse - if, as javadoc claims should be safe, multiple submit calls are performed without
waiting for the async part of the previous call to finish, fields like hasError become ambiguous
and can be used for the wrong call; callback for success/failure is called based on "original
index" of an action in submitted list, but with only one callback supplied to AP in ctor it's
not clear to which submit call the index belongs, if several are outstanding.
> I was going to add support for HBASE-10070 to AP, and found that it might be difficult
to do cleanly.
> It would be nice to normalize AP usage patterns; in particular, separate the "global"
part (load tracking) from per-submit-call part.
> Per-submit part can more conveniently track stuff like initialActions, mapping of indexes
and retry information, that is currently passed around the method calls.
> -I am not sure yet, but maybe sending of the original index to server in "ClientProtos.MultiAction"
can also be avoided.- Cannot be avoided because the API to server doesn't have one-to-one
correspondence between requests and responses in an individual call to multi (retries/rearrangement
have nothing to do with it)

This message was sent by Atlassian JIRA

View raw message