hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Batch puts interrupted ... Requested row out of range for HRegion filestore ...org.apache.hadoop.hbase.client.RetriesExhaustedException:
Date Fri, 06 Aug 2010 22:08:49 GMT

When you run into this problem, it's usually a sign of a META problem,
specifically you have a 'hole' in the META table.

The META table contains a series of keys like so:
table,start_row1,<timestamp>    [data]
table,start_row2,<timestamp>    [data]


When we search for a region for a given row, we build a key like so:
'table,my_row,9*19' and so a search called 'closestRowBefore'.  This
finds the region that contains this row.

Now notice that we only put the start row in the key.... each region
has a start_row,end_row, and all the regions are mutually exclusive
and form complete coverage.  Imagine a row for a region was missing,
we'd consistently find the wrong region and the regionserver would
reject the request (correctly so).

That is what is probably happening here.  Check the table dump in the
master web-ui and see if you can find a 'hole'... where the end-key
doesnt match up with the start-key.

If that is the case, there is a script add_table.rb which is used to
fix these things.


On Fri, Aug 6, 2010 at 2:59 PM, Stuart Smith <stu24mail@yahoo.com> wrote:
> Hello,
>  I'm running hbase 0.20.5, and seeing Puts() fail repeatedly when trying to insert a
specific item into the database.
> Client side I see:
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to contact region server
Some server, retryOnlyOne=true, index=0, islastrow=true, tries=9, numtries=10, i=0, listsize=1,
for region filestore,
> I then looked up which node was hosting the given region (filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b)
on the gui, found the following debug message in the regionserver log:
> 2010-08-06 14:23:47,414 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: Batch
puts interrupted at index=0 because:Requested row out of range for HRegion filestore,bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b,1279604506836,
startKey='bdfa9f2173033330cfae81ece08f75f0002bf3f3a54cde6bbf9192f0187e275b', getEndKey()='be0bc7b3f8bc2a30910b9c758b47cdb730a4691e93f92abb857a2dcc7aefa633',
> Which appears to be coming from:
> /regionserver/HRegionServer.java:1786:      LOG.debug("Batch puts interrupted at index="
+ i + " because:" +
> Which is coming from:
> ./java/org/apache/hadoop/hbase/regionserver/HRegion.java:1658:      throw new WrongRegionException("Requested
row out of range for " +
> This happens repeatedly on a specific item over at least a day or so, even when not much
is happening with the cluster.
> As far as I can tell, it looks like the logic to select the correct region for a given
row is wrong. The row is indeed not in the correct range (at least from what I can tell of
the exception thrown), and the check in HRegion.java:1658:
>  /** Make sure this is a valid row for the HRegion */
>  private void checkRow(final byte [] row) throws IOException {
>    if(!rowIsInRange(regionInfo, row)) {
> Is correctly rejecting the Put().
> So it appears the error would be somewhere in:
> HRegion.java:1550:
>  private void put(final Map<byte [],List<KeyValue>> familyMap,
>      boolean writeToWAL) throws IOException {
> Which appears to be the actual guts of the insert operation.
> However, I don't know enough about the design of HRegions to really decipher this method.
I'll dig into it more, but I thought it might be more efficient just to ask you guys first.
> Any ideas?
> I can update to 0.20.6, but I don't see any fixed jira's on 0.20.6 that seem related..
I could be wrong. I'm not sure what I should do next. Any more information you guys need?
> Note that I am inserting file into the database, and using it's sha256sum as the key.
And the file that is failing does indeed have a sha that corresponds to the key in the message
above (and is out of range).
> Take care,
>  -stu

View raw message