hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ryan rawson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2898) MultiPut makes proper error handling impossible and leads to corrupted data
Date Sat, 14 Aug 2010 00:54:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898486#action_12898486
] 

ryan rawson commented on HBASE-2898:
------------------------------------

I am interested in a new multi-put for 0.90.  There are also cases for multi-get and multi
just about everything. See HBASE-1845.

Improving serialization would be nice, but we'd have to wreck our KeyValue serialization mechanism
and have something custom on the wire.  There is a similar situation in the Result serialization
as well, it's just KeyValues all the way on down.

> MultiPut makes proper error handling impossible and leads to corrupted data
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-2898
>                 URL: https://issues.apache.org/jira/browse/HBASE-2898
>             Project: HBase
>          Issue Type: Bug
>          Components: client, regionserver
>    Affects Versions: 0.89.20100621
>            Reporter: Benoit Sigoure
>            Priority: Blocker
>             Fix For: 0.90.0
>
>
> The {{MultiPut}} RPC needs to be completely rewritten.  Let's see why step by step.
> # An HBase user calls any of the {{put}} methods on an {{HTable}} instance.
> # Eventually, {{HTable#flushCommits}} is invoked to actually send the edits to the RegionServer(s).
> # This takes us to {{HConnectionManager#processBatchOfPuts}} where all edits are sorted
into one or more {{MultiPut}}.  Each {{MultiPut}} is aggregating all the edits that are going
to a particular RegionServer.
> # A thread pool is used to send all the {{MultiPut}} in parallel to their respective
RegionServer.  Let's follow what happens for a single {{MultiPut}}.
> # The {{MultiPut}} travels through the IPC code on the client and then through the network
and then through the IPC code on the RegionServer.
> # We're now in {{HRegionServer#multiPut}} where a new {{MultiPutResponse}} is created.
> # Still in {{HRegionServer#multiPut}}.  Since a {{MultiPut}} is essentially a map from
region name to a list of {{Put}} for that region, there's a {{for}} loop that executes each
list of {{Put}} for each region sequentially.  Let's follow what happens for a single list
of {{Put}} for a particular region.
> # We're now in {{HRegionServer#put(byte[], List<Put>)}}.  Each {{Put}} is associated
with the row lock that was specified by the client (if any).  Then the pairs of {{(Put, lock
id)}} are handed to the right {{HRegion}}.
> # Now we're in {{HRegion#put(Pair<Put, Integer>[])}}, which immediately takes us
to {{HRegion#doMiniBatchPut}}.
> # At this point, let's assume that we're doing just 2 edits.  So the {{BatchOperationInProgress}}
that {{doMiniBatchPut}} contains just 2 {{Put}}.
> # The {{while}} loop in {{doMiniBatchPut}} that's going to execute each {{Put}} starts.
> # The first {{Put}} fails because an exception is thrown when appending the edit to the
{{WAL}}.  Its {{batchOp.retCodes}} is marked as {{OperationStatusCode.FAILURE}}.
> # Because there was an exception, we're back to {{HRegion#put(Pair<Put, Integer>[])}}
where the {{while}} loop will test that {{batchOp.isDone}} is {{false}} and do another iteration.
> # {{doMiniBatchPut}} is called again and handles the remaining {{Put}}.
> # The second {{Put}} succeeds normally, so its {{batchOp.retCodes}} is marked as {{OperationStatusCode.SUCCESS}}.
> # {{doMiniBatchPut}} is done and returns to {{HRegion#put(Pair<Put, Integer>[])}},
which returns to {{HRegionServer#put(byte[], List<Put>)}}.
> # At this point, {{HRegionServer#put(byte[], List<Put>)}} does a {{for}} loop and
extracts the index of the *first* {{Put}} that failed out of the {{OperationStatusCode[]}}.
 In our case, it'll return 0 since the first {{Put}} failed.
> # This index in the list of {{Put}} of the first that failed (0 in this case) is returned
to {{HRegionServer#multiPut}}, which records in the {{MultiPutResponse}} - the client knows
that the first {{Put}} failed but has no idea about the other one.
> So the client has no reliable way of knowing which {{Put}} failed (if any) past the first
failure.  All it knows is that for a particular region, they succeeded up to a particular
{{Put}}, at which point there was a failure, and then the remaining may or may not have succeeded.
 Its best bet is to retry all the {{Put}} past the index of the first failure for this region.
 But this has an unintended consequence.  The {{Put}} that were successful during the first
run will be *re-applied*.  This will unexpectedly create extra versions.  Now I realize most
people don't really care about versions, so they won't notice.  But whoever relies on the
versions for whatever reason will rightfully consider this to be data corruption.
> As it is now, {{MultiPut}} makes proper error handling impossible.  Since this RPC cannot
guarantee any atomicity other than at the individual {{Put}} level, it should return to the
client specific information about which {{Put}} failed in case of a failure, so that the client
can do proper error handling.
> This requires us to change the {{MultiPutResponse}} so that it can indicate which {{Put}}
specifically failed.  We could do this for instance by giving the index of the {{Put}} along
with its {{OperationStatusCode}}.  So in the scenario above, the {{MultiPutResponse}} would
essentially return something like: "for that particular region, put #0 failed, put #1 succeeded".
 If we want to save a bit of space, we may want to omit the successes from the response and
only mention the failures - so a response that doesn't mention any failure means that everything
was successful.  Not sure whether that's a good idea though.
> Since doing this require an incompatible RPC change, I propose that we take the opportunity
to rewrite the {{MultiPut}} RPC too.  Right now it's very inefficient, it's just a hack on
top of {{Put}}.  When {{MultiPut}} is written to the wire, a lot of unnecessary duplicate
data is sent out.  The timestamp, the row key and the family are sent out to the wire N+1
times, where N is the number of edits for a particular row, instead of just once (!).
> Alternatively, if we don't want to change the RPCs, we can fix this issue in a backward
compatible way by making the {{while}} loop in {{HRegion#put(Pair<Put, Integer>[])}}
stop as soon as a failure is encountered.
> Simple test case to highlight the bug (against HBase trunk).  The HBase client retries
hopelessly until it reaches the maximum number of attempts before bailing out.
> {code}
> hbase(main):001:0> describe 'mytable'
> DESCRIPTION                                                              ENABLED    
                           
>  {NAME => 'mytable', FAMILIES => [{NAME => 'myfam', BLOOMFILTER => 'NONE
true                                   
>  ', REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TT   
                                    
>  L => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCAC   
                                    
>  HE => 'true'}]}                                                                 
                              
> 1 row(s) in 0.7760 seconds
> {code}
> {code:java}
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.client.HTable;
> import org.apache.hadoop.hbase.client.Put;
> final class put {
>   public static void main(String[]a) throws Exception {
>     final HTable t = new HTable(HBaseConfiguration.create(), "mytable");
>     final Put p = new Put("somerow".getBytes());
>     p.add("badfam".getBytes(), "qual2".getBytes(), "badvalue".getBytes());
>     t.put(p);
>     t.flushCommits();
>   }
> }
> {code}
> When running {{put}}:
> {noformat}
> HConnectionManager$TableServers: locateRegionInMeta(parentTable=.META., tableName=mytable,
row=[] (""), useCache=true)
> HConnectionManager$TableServers: locateRegionInMeta(parentTable=-ROOT-, tableName=.META.,
row=[109, 121, 116, 97, 98, 108, 101, 44, 44, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57, 57,
57, 57, 57] ("mytable,,99999999999999"), useCache=true)
> HConnectionManager$TableServers: Found ROOT at 127.0.0.1:54754
> HConnectionManager$TableServers: Cached location for .META.,,1.1028785192 is localhost.:54754
> HConnectionManager$TableServers: Cached location for mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
is localhost.:54754
> HConnectionManager$TableServers: Failed past 0 for region: mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.,
removing from cache
> HConnectionManager$TableServers: Removed mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
for tableName=mytable from cache because of somerow
> HConnectionManager$TableServers: processBatchOfPuts had some failures, sleeping for 1000
ms!
> HConnectionManager$TableServers: locateRegionInMeta(parentTable=.META., tableName=mytable,
row=[115, 111, 109, 101, 114, 111, 119] ("somerow"), useCache=true)
> HConnectionManager$TableServers: Cached location for mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
is localhost.:54754
> HConnectionManager$TableServers: Failed past 0 for region: mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.,
removing from cache
> HConnectionManager$TableServers: Removed mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
for tableName=mytable from cache because of somerow
> HConnectionManager$TableServers: processBatchOfPuts had some failures, sleeping for 1000
ms!
> HConnectionManager$TableServers: locateRegionInMeta(parentTable=.META., tableName=mytable,
row=[115, 111, 109, 101, 114, 111, 119] ("somerow"), useCache=true)
> HConnectionManager$TableServers: Cached location for mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
is localhost.:54754
> HConnectionManager$TableServers: Failed past 0 for region: mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.,
removing from cache
> HConnectionManager$TableServers: Removed mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
for tableName=mytable from cache because of somerow
> HConnectionManager$TableServers: processBatchOfPuts had some failures, sleeping for 1000
ms!
> HConnectionManager$TableServers: locateRegionInMeta(parentTable=.META., tableName=mytable,
row=[115, 111, 109, 101, 114, 111, 119] ("somerow"), useCache=true)
> HConnectionManager$TableServers: Cached location for mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
is localhost.:54754
> HConnectionManager$TableServers: Failed past 0 for region: mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.,
removing from cache
> HConnectionManager$TableServers: Removed mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
for tableName=mytable from cache because of somerow
> HConnectionManager$TableServers: processBatchOfPuts had some failures, sleeping for 2000
ms!
> HConnectionManager$TableServers: locateRegionInMeta(parentTable=.META., tableName=mytable,
row=[115, 111, 109, 101, 114, 111, 119] ("somerow"), useCache=true)
> HConnectionManager$TableServers: Cached location for mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
is localhost.:54754
> HConnectionManager$TableServers: Failed past 0 for region: mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.,
removing from cache
> HConnectionManager$TableServers: Removed mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
for tableName=mytable from cache because of somerow
> HConnectionManager$TableServers: processBatchOfPuts had some failures, sleeping for 2000
ms!
> HConnectionManager$TableServers: locateRegionInMeta(parentTable=.META., tableName=mytable,
row=[115, 111, 109, 101, 114, 111, 119] ("somerow"), useCache=true)
> HConnectionManager$TableServers: Cached location for mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
is localhost.:54754
> HConnectionManager$TableServers: Failed past 0 for region: mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.,
removing from cache
> HConnectionManager$TableServers: Removed mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
for tableName=mytable from cache because of somerow
> HConnectionManager$TableServers: processBatchOfPuts had some failures, sleeping for 4000
ms!
> HConnectionManager$TableServers: locateRegionInMeta(parentTable=.META., tableName=mytable,
row=[115, 111, 109, 101, 114, 111, 119] ("somerow"), useCache=true)
> HConnectionManager$TableServers: Cached location for mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
is localhost.:54754
> HConnectionManager$TableServers: Failed past 0 for region: mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.,
removing from cache
> HConnectionManager$TableServers: Removed mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
for tableName=mytable from cache because of somerow
> HConnectionManager$TableServers: processBatchOfPuts had some failures, sleeping for 4000
ms!
> HConnectionManager$TableServers: locateRegionInMeta(parentTable=.META., tableName=mytable,
row=[115, 111, 109, 101, 114, 111, 119] ("somerow"), useCache=true)
> HConnectionManager$TableServers: Cached location for mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
is localhost.:54754
> HConnectionManager$TableServers: Failed past 0 for region: mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.,
removing from cache
> HConnectionManager$TableServers: Removed mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
for tableName=mytable from cache because of somerow
> HConnectionManager$TableServers: processBatchOfPuts had some failures, sleeping for 8000
ms!
> HConnectionManager$TableServers: locateRegionInMeta(parentTable=.META., tableName=mytable,
row=[115, 111, 109, 101, 114, 111, 119] ("somerow"), useCache=true)
> HConnectionManager$TableServers: Cached location for mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
is localhost.:54754
> HConnectionManager$TableServers: Failed past 0 for region: mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.,
removing from cache
> HConnectionManager$TableServers: Removed mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
for tableName=mytable from cache because of somerow
> HConnectionManager$TableServers: processBatchOfPuts had some failures, sleeping for 16000
ms!
> HConnectionManager$TableServers: locateRegionInMeta(parentTable=.META., tableName=mytable,
row=[115, 111, 109, 101, 114, 111, 119] ("somerow"), useCache=true)
> HConnectionManager$TableServers: Cached location for mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
is localhost.:54754
> HConnectionManager$TableServers: Failed past 0 for region: mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.,
removing from cache
> HConnectionManager$TableServers: Removed mytable,,1280754761944.db25b4cc63e97e6a7e1bc28402fe57db.
for tableName=mytable from cache because of somerow
> HConnectionManager$TableServers: processBatchOfPuts had some failures, sleeping for 32000
ms!
> Exception in thread "main" org.apache.hadoop.hbase.client.RetriesExhaustedException:
Still had 1 puts left after retrying 10 times.
> 	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfPuts(HConnectionManager.java:1534)
> 	at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:664)
> 	at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:549)
> 	at org.apache.hadoop.hbase.client.HTable.put(HTable.java:535)
> 	at put.main(put.java:10)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message