hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-826) delete table followed by recreation results in honked table
Date Sat, 16 Aug 2008 21:43:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12623163#action_12623163
] 

stack commented on HBASE-826:
-----------------------------

At first it was easy to provoke the issue.  Two regions was sufficient.  Now with the J-D
patch in place, things are better for sure.  On small tables the problem is gone.

I'm testing with bigger tables to be sure.

I don't think the fix is all in though, or, there is another issue lurking. Last night at
least I found two lone edits from an old instance of a table shining through doing getClosest
against new table (manifestation in MR job is the log snippet posted at the start of this
issue).

I'm testing by running sizeable imports -- 4 to 32M -- on our little cluster of 4 machines
over and over again.  I upload, then delete the table.  I then stop hbase, memcache is flushed
out to FileSystem (Issue seems to be related to travesal of many store files in FileSystem).
  Is on the subsequent import where I'd issues.

I'm having trouble reliably replicating the errant edits this morning.  Testing works most
of the time.  

Repair of a table with old edits is tough too.   I can't scan using an old timestamp to find
the errant edits.   I ain't sure why; my guess is as in how getFull works, the presence of
newer deletes can overshadow older shells.  If we knew the HStoreKey of the errant edits,
a repair would just be a case of adding deletes against their coordinates but discovering
the errant edit coordinates is tough; up to this I've only been able to do it by adding in
super-intrusive logging into the getClosestAtOrBefore calculation or by sort/joins on the
output of the iterate script above.

> delete table followed by recreation results in honked table
> -----------------------------------------------------------
>
>                 Key: HBASE-826
>                 URL: https://issues.apache.org/jira/browse/HBASE-826
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.2.1, 0.3.0
>
>         Attachments: hbase-826_0.3.0.patch
>
>
> Daniel Leffel suspected that delete and then recreate causes issues.  I tried it on our
little cluster.  I'm doing a MR load up into the newly created table and after a few million
rows, the MR job just hangs.  Its looking for a region that doesn't exist:
> {code}
> 2008-08-13 03:32:36,840 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM
Metrics with processName=MAP, sessionId=
> 2008-08-13 03:32:36,940 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2008-08-13 03:32:37,420 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Found ROOT REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED
=> 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true',
FAMILIES => [{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE', VERSIONS
=> '1', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE
=> 'false'}]}}
> 2008-08-13 03:32:37,541 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
reloading table servers because: HRegionInfo was null or empty in .META.
> 2008-08-13 03:32:37,541 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Removed .META.,,1 from cache because of TestTable,0008388608,99999999999999
> 2008-08-13 03:32:37,544 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Found ROOT REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED
=> 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true',
FAMILIES => [{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE', VERSIONS
=> '1', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE
=> 'false'}]}}
> 2008-08-13 03:32:47,605 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
reloading table servers because: HRegionInfo was null or empty in .META.
> 2008-08-13 03:32:47,606 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Removed .META.,,1 from cache because of TestTable,0008388608,99999999999999
> ....
> {code}
> My guess is that its a region that was in the tables' previous incarnation with ghosts
left over down inside .META.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message