hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: something wrong on trunk possibly
Date Sun, 05 Apr 2009 21:24:40 GMT
I'm hoping the new key format will help with these things a bit.  One
solution is a dump/restore of .META. whereby you dump the known keys, then
delete and reload the table after truncating and deleting all the store
files.  We don't have tools for that yet iirc...

maybe hbase-1234 will help fix this problem in a fundamental way?




On Sun, Apr 5, 2009 at 1:25 PM, stack <stack@duboce.net> wrote:

> Sorry, I meant to write earlier.
>
> I've seen this condition in the past back when deletes were not working
> properly.  What I'd see is that the HRegionInfo entry in .META. had been
> deleted but, somehow, a but had it that the the accompanying startcode and
> server entries were not deleted.   These startcode and server entries would
> bubble up during getClosest but were not easily deletable since were not in
> current set of .META. regions.  We'd seemed to have put this issue behind
> us.   Maybe your OOME during the bulk upload brought it on?
>
> At powerset, we had a condition where a table had entries in .META. that
> had
> been made with an old hbase.  Updating to an hbase with the deletes fix was
> not sufficient; when these empty HRI's shine through, you can't delete the
> startcode and server entries seemingly because "they are not in the table"
> (Getting their timestamps proved awkward).  The only recourse back then was
> renaming the table so it a new namespace in .META.
>
> St.Ack
>
> On Sat, Apr 4, 2009 at 12:14 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>
> > I looked at the commits on trunk, nothing new recently.
> >
> > Some weird corruption and scanner errors in trunk.... nuking /hbase and
> > restarting fixed it, something wrong with the .META. table obviously.
> >
> > Looks like what is happening is findClosestBefore() returns a 'empty'
> > RowResult, with absolutely no columns in it, futhermore, the row id
> doesnt
> > appear in my 'region list' Web-UI.  So it's not an active real alive
> > region,
> > it's some other artifact that is still hanging out. Maybe it's a phantom
> > delete showing up as an entry.
> >
> > I'm not sure it's worthwhile debugging until after HBASE-1234 comes out.
> > After all the buggy code is probably being substantially reworked and/or
> > removed.
> >
> > -ryan
> >
> > On Sat, Apr 4, 2009 at 2:19 AM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> >
> > > Hey guys,
> > >
> > > There seems to be something wrong on trunk... I used to have long
> > > map-reduce jobs, but now they are failing, unable to commit:
> > >
> > > 2009-04-04 01:17:09,279 DEBUG
> > > org.apache.hadoop.hbase.client.HConnectionManager$TableServers:
> > > locateRegionInMeta attempt 5 of 10 failed; retrying after sleep of 8000
> > > java.io.IOException: HRegionInfo was null or empty in .META.
> > >         at
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:566)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:515)
> > >         at
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:484)
> > > ... etc
> > >
> > > Basically mappers get stuck up on commits and make no progress, mapred
> > > kills them, done.
> > >
> > > I've spent some time banging at it - made sure that ulimit -n is good,
> > set
> > > the ipc handler limit to 30, cranked down the number of maps I'm doing,
> > > etc.  To no avail.
> > >
> > > At least I figured out how to debug hadoop jobs a bit.
> > >
> > > Anyone have thoughts?
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message