hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-826) delete table followed by recreation results in honked table
Date Fri, 29 Aug 2008 04:54:46 GMT

    [ https://issues.apache.org/jira/browse/HBASE-826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12626844#action_12626844
] 

stack commented on HBASE-826:
-----------------------------

Ok, so figured the bug I was seeing AFTER application of above J-D patch. I thought the bug
was more of what J-D's patch was supposed to be fixing but its something else, something equally
as ugly.

Here is how I consistently generated the problem with J-D's patch in place:
{code}
Fill a table.
Stop hbase so files are flushed.
Start hbase.
Remove table (disable/drop).
Stop hbase so again flushed to filesystem.
Then look at the content of the .META. using above iteratemeta script.
Sort -u output then run below 'check' script to match delete and non-delete cells
{code}

Here's the script:
{code}                                                                                   
                                                                                         
                                               1,1            All
#!/usr/bin/env ruby
# Take on STDIN, sorted and uniqued output of the iteratemeta.rb
lastline = nil
for line in STDIN
  if line =~ /(.*)\s+d$/
    if lastline != nil
      puts lastline unless lastline.eql?($1)
    end
    lastline = nil
  else
    puts lastline unless lastline == nil
    lastline = line.rstrip
  end
end
{code}

Was finding a few keys that should have had overshadowing deletes but the deletes were not
present.  (If I attempted refilling table, eventually, we'd fail with the 'empty HRI' complaint).

I thought the fail was because the above J-D patch was incomplete.

Turns out its a problem in compactions.  We see it since the compaction algorithm changed.

Here is what is happening.

Max versions by default in '.META.' table is *1*.  Version check looks at row and column component
of a HStoreKey only: i.e. not at timestamp.  info:serverstartcode and info:server are edited
everytime we startup and when we offline (disable) and delete.  The delete cell has same timestamp
as the cell it would delete; i.e. it'll usually be older than offlining update.

So, with our new smart compaction, we do not do every file when we compact.  What I was seeing
after the restarts was that the two most recent files would have been compacted on final restart.
 We'd discard the delete cell that was not the offlining of info:server and info:serverstartcode.
 The original cell would be back in the biggest and oldest storefile.  It would on occasion
get a chance to come through doing getClosestAtOrBefore.

So, I think the fix is that we cannot remove anything compacting, not until we do a major
compaction when all files are in play.  Will talk to Billy over in hbase-834 about it.  Meantime,
will apply the J-D patch and close this issue.

> delete table followed by recreation results in honked table
> -----------------------------------------------------------
>
>                 Key: HBASE-826
>                 URL: https://issues.apache.org/jira/browse/HBASE-826
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.2.1, 0.18.0
>
>         Attachments: 826-v3.patch, hbase-826_0.3.0.patch
>
>
> Daniel Leffel suspected that delete and then recreate causes issues.  I tried it on our
little cluster.  I'm doing a MR load up into the newly created table and after a few million
rows, the MR job just hangs.  Its looking for a region that doesn't exist:
> {code}
> 2008-08-13 03:32:36,840 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM
Metrics with processName=MAP, sessionId=
> 2008-08-13 03:32:36,940 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
> 2008-08-13 03:32:37,420 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Found ROOT REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED
=> 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true',
FAMILIES => [{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE', VERSIONS
=> '1', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE
=> 'false'}]}}
> 2008-08-13 03:32:37,541 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
reloading table servers because: HRegionInfo was null or empty in .META.
> 2008-08-13 03:32:37,541 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Removed .META.,,1 from cache because of TestTable,0008388608,99999999999999
> 2008-08-13 03:32:37,544 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Found ROOT REGION => {NAME => '-ROOT-,,0', STARTKEY => '', ENDKEY => '', ENCODED
=> 70236052, TABLE => {{NAME => '-ROOT-', IS_ROOT => 'true', IS_META => 'true',
FAMILIES => [{NAME => 'info', BLOOMFILTER => 'false', COMPRESSION => 'NONE', VERSIONS
=> '1', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE
=> 'false'}]}}
> 2008-08-13 03:32:47,605 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
reloading table servers because: HRegionInfo was null or empty in .META.
> 2008-08-13 03:32:47,606 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
Removed .META.,,1 from cache because of TestTable,0008388608,99999999999999
> ....
> {code}
> My guess is that its a region that was in the tables' previous incarnation with ghosts
left over down inside .META.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message