hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3787) Increment is non-idempotent but client retries RPC
Date Wed, 08 May 2013 00:19:17 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651478#comment-13651478
] 

Sergey Shelukhin commented on HBASE-3787:
-----------------------------------------

There are some design questions on r. Perhaps we should flesh out the design before I make
any major changes.

1) Should we add actual usage of nonceGroup/client ID?
We can do that. Depends also on (2). I will probably change the server manager to lump nonce
group and nonce into array wrapper and store these in the map,
instead of using pair. Pair is simpler but worse, right now I only added it for forward compat.
Map of maps is pain to clean up without tricks or epic lock, I have added that for sequential
nonces but I wonder if it's worth it for simple nonces.
Client ID, for now, will be produced from IP, process id, and thread id. It will be hashed
to 8 bytes and written into nonceGroup.

2) Is 8 bytes enough to avoid collisions?
The answer is "maybe". It depends on the number of requests overall in the cluster and for
how long we store nonces.
We can alleviate this by adding client ID I guess, which will make it 16 bytes, 8 unique per
client and 8 random.

3) What random should we use?
Java uses SecureRandom to generate UUIDs. We can use some other Random, they claim to produce
uniformly distributed numbers.

4) Will too many nonces be stored?
If we keep nonces for an hour, and do 10k increments per second per server, we will have stored
36000000 nonces on a server.
With map overhead, 2 object overheads, 2 primitive longs and an enum value, it's probably
in excess of 120 bytes per entry (without clientId). So yeah it's a lot of memory.
Time to store nonces is configurable, though, and with default retry setting as little as
5 minutes could provide sufficient safety.
With 5 minutes we'd have something like ~400Mb of RAM for hash table, which is not totally
horrible (especially for 10k QPS :)).
Some solutions were proposed in the r, such as storing the mutation creation time and rejecting
after certain time.
However that relies on synchronized clocks, and also doesn't solve the problem in a sense
that client has no idea about the original problem - should he retry?
What do you think?
If you think it's realistic workload I can rework the sequential nonce patch instead, and
there nonces would be collapsed. If clientId is used and incorporates the region,
requests arriving for the same region will generally go to the same server for some time,
and in sequential order so a lot can be collapsed.
However it will add complexity.
What do you think?

                
> Increment is non-idempotent but client retries RPC
> --------------------------------------------------
>
>                 Key: HBASE-3787
>                 URL: https://issues.apache.org/jira/browse/HBASE-3787
>             Project: HBase
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 0.94.4, 0.95.2
>            Reporter: dhruba borthakur
>            Assignee: Sergey Shelukhin
>            Priority: Critical
>             Fix For: 0.95.1
>
>         Attachments: HBASE-3787-partial.patch, HBASE-3787-v0.patch, HBASE-3787-v1.patch,
HBASE-3787-v2.patch
>
>
> The HTable.increment() operation is non-idempotent. The client retries the increment
RPC a few times (as specified by configuration) before throwing an error to the application.
This makes it possible that the same increment call be applied twice at the server.
> For increment operations, is it better to use HConnectionManager.getRegionServerWithoutRetries()?
Another  option would be to enhance the IPC module to make the RPC server correctly identify
if the RPC is a retry attempt and handle accordingly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message