Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-dev@hadoop.apache.org
Message-ID: <1098508502.1220836484284.JavaMail.jira@brutus>
Date: Sun, 7 Sep 2008 18:14:44 -0700 (PDT)
From: "Jonathan Gray (JIRA)" <jira@apache.org>
To: hbase-dev@hadoop.apache.org
Subject: [jira] Commented: (HBASE-868) Incrementing binary rows cause
 strange behavior once table splits
In-Reply-To: <1643124107.1220548184318.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HBASE-868?page=3Dcom.atlassian.=
jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D12629=
031#action_12629031 ]=20

Jonathan Gray commented on HBASE-868:
-------------------------------------

There's still something broken here.  It's not related to the splitting and=
 all that, that still appears to be fixed.

Now there is an issue with obtaining HTable reference to the same table in =
question.  Exception is:

org.apache.hadoop.hbase.TableNotFoundException: items
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.l=
ocateRegionInMeta(HConnectionManager.java:508)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.l=
ocateRegion(HConnectionManager.java:460)
        at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.l=
ocateRegion(HConnectionManager.java:420)
        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:130)
        at HBaseRef.<init>(HBaseRef.java:29)
        at Import.<init>(Import.java:20)
        at Import.main(Import.java:26)

Will post more info as I discover it.

> Incrementing binary rows cause strange behavior once table splits
> -----------------------------------------------------------------
>
>                 Key: HBASE-868
>                 URL: https://issues.apache.org/jira/browse/HBASE-868
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.2.1
>         Environment: 4 nodes: 1 master/namenode, 3 regionserver/datanodes
>            Reporter: Jonathan Gray
>            Assignee: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.2.1
>
>         Attachments: hbase-868-0.2-v2.patch, hbase-868-0.2-v3.patch, hbas=
e-868-0.2-v4.patch, hbase-868-0.2-v5.patch, hbase-868-branch-0.2.patch, hba=
se-hbase-regionserver-web3.streamy.com.log, master_regionsinsources.png, re=
gionserver_regionlist.png
>
>
> We're now using incrementing binary row keys and to this point had only b=
een doing small tests with them, never having actually had a table split.
> I'm still working through the logs but it seems that there's a problem so=
mewhere with startKey and endKeys.
> Binary in general is not well supported (inconsistent in display in the l=
ogs, very odd rendering in the web ui, hard to interpret in the shell, etc.=
.)  Once we figure out these serious bugs we will spend some time trying to=
 clean that up.  But right now this makes things even harder to debug.
> The actual symptoms are that my import eventually started to throw (in th=
e client and on the region server):
> org.apache.hadoop.hbase.regionserver.WrongRegionException: org.apache.had=
oop.hbase.regionserver.WrongRegionException: Requested row out of range for=
 HRegion sources,,1220546297947, startKey=3D'', getEndKey()=3D'
> =C3=AF=C2=BF=C2=BD', row=3D'c=C3=AF=C2=BF=C2=BD'
>         at org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.=
java:1775)
>         at org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRe=
gion.java:1831)
>         at org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegi=
on.java:1387)
>         at org.apache.hadoop.hbase.regionserver.HRegionServer.batchUpdate=
(HRegionServer.java:1145)
>         at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMeth=
odAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:616)
>         at org.apache.hadoop.hbase.ipc.HbaseRPC$Server.call(HbaseRPC.java=
:473)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
> There are 3 regionservers, but this error only happens on one of them (wh=
ile the other two always continue normally, allowing updates to this same t=
able).
> The regionserver that this happens on is special for two reasons, one it =
is hosting the META table.  And secondly it also hosts the first region in =
the table, startkey =3D ''.  I'm unsure which is the cause, I have a clue l=
eading to both.
> After about 15 minutes, the regionserver sees:
> 2008-09-04 09:52:57,948 DEBUG org.apache.hadoop.hbase.regionserver.HRegio=
n: Started memcache flush for region .META.,,1. Current region memcache siz=
e 24.5k
> 2008-09-04 09:52:58,003 DEBUG org.apache.hadoop.hbase.regionserver.HStore=
: Added /hbase/.META./1028785192/historian/mapfiles/8699673838203663799 wit=
h 106 entries, sequence id 25341510, data size 8.9k, file size 10.6k
> 2008-09-04 09:52:58,050 DEBUG org.apache.hadoop.hbase.regionserver.HStore=
: Added /hbase/.META./1028785192/info/mapfiles/1791564557665476834 with 96 =
entries, sequence id 25341510, data size 14.2k, file size 15.8k
> 2008-09-04 09:52:58,050 DEBUG org.apache.hadoop.hbase.regionserver.HRegio=
n: Finished memcache flush for region .META.,,1 in 102ms, sequence id=3D253=
41510, compaction requested=3Dtrue
> 2008-09-04 09:52:58,050 DEBUG org.apache.hadoop.hbase.regionserver.Compac=
tSplitThread: Compaction requested for region: .META.,,1
> 2008-09-04 09:52:58,051 INFO org.apache.hadoop.hbase.regionserver.HRegion=
: starting compaction on region .META.,,1
> 2008-09-04 09:52:58,055 DEBUG org.apache.hadoop.hbase.regionserver.HStore=
: Compaction size of 1028785192/historian: 41.9k; Skipped 1 files , size: 2=
1957
> 2008-09-04 09:52:58,088 DEBUG org.apache.hadoop.hbase.regionserver.HStore=
: started compaction of 2 files into /hbase/.META./compaction.dir/102878519=
2/historian/mapfiles/6948796056606699674
> 2008-09-04 09:52:58,128 DEBUG org.apache.hadoop.hbase.regionserver.HStore=
: moving /hbase/.META./compaction.dir/1028785192/historian/mapfiles/6948796=
056606699674 to /hbase/.META./1028785192/historian/mapfiles/757338758409141=
42
> 2008-09-04 09:52:58,175 DEBUG org.apache.hadoop.hbase.regionserver.HStore=
: Completed compaction of 1028785192/historian store size is 41.1k; time si=
nce last major compaction: 5426 seconds
> 2008-09-04 09:52:58,179 DEBUG org.apache.hadoop.hbase.regionserver.HStore=
: Compaction size of 1028785192/info: 61.9k; Skipped 0 files , size: 0
> 2008-09-04 09:52:58,192 DEBUG org.apache.hadoop.hbase.regionserver.HStore=
: started compaction of 3 files into /hbase/.META./compaction.dir/102878519=
2/info/mapfiles/7781013568996125923
> 2008-09-04 09:52:58,260 DEBUG org.apache.hadoop.hbase.regionserver.HStore=
: moving /hbase/.META./compaction.dir/1028785192/info/mapfiles/778101356899=
6125923 to /hbase/.META./1028785192/info/mapfiles/2187291308709057119
> 2008-09-04 09:52:58,290 DEBUG org.apache.hadoop.hbase.regionserver.HStore=
: Completed compaction of 1028785192/info store size is 61.0k; time since l=
ast major compaction: 32534 seconds
> 2008-09-04 09:52:58,296 INFO org.apache.hadoop.hbase.regionserver.HRegion=
: compaction completed on region .META.,,1 in 0sec
> 2008-09-04 09:53:09,620 INFO org.apache.hadoop.hbase.regionserver.HRegion=
Server: Scanner -2085474968086468199 lease expired
> 2008-09-04 09:54:35,449 INFO org.apache.hadoop.hbase.regionserver.LogRoll=
er: Rolling hlog. Number of entries: 30009
> Following this, insertion continues normally.  This leads me to believe t=
here's an issue with the META table memcache, but oddly the other regions o=
f this table on other regionservers continue on fine.
> As for the hosting the first region of the table on this region server, i=
t seems to be consistent that I get the row out of range errors when lookin=
g for a region with startkey =3D '', although there are 5 other regions on =
this RS.
> Will attach full logs from master and three RS.  Also a couple screenshot=
s showing weird behavior in listing the regions in the table.

--=20
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.