hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoff Hendrey" <ghend...@decarta.com>
Subject RE: PENDING_CLOSE for too long
Date Mon, 14 Nov 2011 23:22:05 GMT
thanks, and CCing my team

-----Original Message-----
From: Stuart Smith [mailto:stu24mail@yahoo.com] 
Sent: Monday, November 14, 2011 3:20 PM
To: user@hbase.apache.org
Subject: Re: PENDING_CLOSE for too long

Thanks Geoff!

  The slow reply was due to the saga being moved to the cloudera lists.

I ended up trying to merge all my regions (offline) using the java API (since I had gotten
to about 20K regions for a given table), and messing up badly, so I just started from scratch,
and have started reloading data with a new max region filesize.

This took the number of regions I had from 20K to high hundreds, and so far, hbase seems much
happier - I'm only about 1/2 - 2/3's of the way to where I was before, though, so we'll see
what happens, but it does seem to work a lot better :)

Btw.. if you use the merge API.. make sure you don't accidently comment out code that sorts
your region listing by key before you start merging.. the API will happily let you merge any
two random regions.. creating lots of interesting overlaps.... :O


Take care,
  -stu




________________________________
From: Geoff Hendrey <ghendrey@decarta.com>
To: user@hbase.apache.org
Cc: user@hbase.apache.org; Stuart Smith <stu24mail@yahoo.com>
Sent: Saturday, October 29, 2011 7:08 PM
Subject: Re: PENDING_CLOSE for too long

Stuart -

Have you disabled splitting? I believe you can work around the issue of PENDInG_CLOSE by presplitting
your table and disabling splitting. Worked for us.

Sent from my iPhone

On Oct 29, 2011, at 4:19 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:

> In 0.92 (to be released in 2 weeks), you can expect improvement in this
> regard.
> See HBASE-3368.
> 
> Geoff:
> Can you publish your tool on HBASE JIRA ?
> 
> Thanks
> 
> On Sat, Oct 29, 2011 at 2:35 PM, Geoff Hendrey <ghendrey@decarta.com> wrote:
> 
> > Sure. I posted the code many weeks back for a tool that will repair holes
> > in .mETA.
> >
> > If you do a check on the list, you should find it. I'll send you the
> > latest code for that. Maybe I made some fixes after I posted the code.
> > Please ping me if I forget. I've used it to repair huge tables  (and fixed
> > subtle bugs in the process) so I'm confident it works.
> >
> > No matter what anyone tells me, I know hbase is horribly broken for the
> > use case of doing bulk writes from an mr job. It shits the bed every time
> > you pass a certain scale. For this reason we've completely rewritten our
> > code so that we use bulkloading. It's way more efficient and always work.
> >
> > Please ping me until I send you the code. Otherwise I will forget.
> >
> > Sent from my iPhone
> >
> > On Oct 29, 2011, at 1:39 PM, "Stuart Smith" <stu24mail@yahoo.com> wrote:
> >
> > > Hello Geoff,
> > >
> > >   I usually don't show up here, since I use CDH, and good form means I
> > should stay on CDH-users,
> > > But!
> > >   I've been seeing the same issues for months:
> > >
> > >  - PENDING_CLOSE too long, master tries to reassign - I see an
> > continuous stream of these.
> > >  - WrongRegionExceptions due to overlapping regions & holes in the
> > regions.
> > >
> > > I just spent all day yesterday cribbing off of St.Ack's check_meta.rb
> > script to write a java program to fix up overlaps & holes in an offline
> > fashion (hbase down, directly on hdfs), and will start testing next week
> > (cross my fingers!).
> > >
> > > It seems like the pending close messages can be ignored?
> > > And once I test my tool, and confirm I know a little bit about what I'm
> > doing, maybe we could share notes?
> > >
> > > Take care,
> > >   -stu
> > >
> > >
> > >
> > > ________________________________
> > > From: Geoff Hendrey <ghendrey@decarta.com>
> > > To: user@hbase.apache.org
> > > Cc: hbase-user@hadoop.apache.org
> > > Sent: Saturday, September 3, 2011 12:11 AM
> > > Subject: RE: PENDING_CLOSE for too long
> > >
> > > "Are you having trouble getting to any of your data out in tables?"
> > >
> > > depends what you mean. We see corruptions from time to time that prevent
> > > us from getting data, one way or another. Today's corruption was regions
> > > with duplicate start and end rows. We fixed that by deleting the
> > > offending regions from HDFS, and running add_table.rb to restore the
> > > meta. The other common corruption is the holes in ".META." that we
> > > repair with a little tool we wrote. We'd love to learn why we see these
> > > corruptions with such regularity (seemingly much higher than others on
> > > the list).
> > >
> > > We will implement timeout you suggest, and see how it goes.
> > >
> > > Thanks,
> > > Geoff
> > >
> > > -----Original Message-----
> > > From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> > > Stack
> > > Sent: Friday, September 02, 2011 10:51 PM
> > > To: user@hbase.apache.org
> > > Cc: hbase-user@hadoop.apache.org
> > > Subject: Re: PENDING_CLOSE for too long
> > >
> > > Are you having trouble getting to any of your data out in tables?
> > >
> > > To get rid of them, try restarting your master.
> > >
> > > Before you restart your master, do "HBASE-4126  Make timeoutmonitor
> > > timeout after 30 minutes instead of 3"; i.e. set
> > > "hbase.master.assignment.timeoutmonitor.timeout" to 1800000 in
> > > hbase-site.xml.
> > >
> > > St.Ack
> > >
> > > On Fri, Sep 2, 2011 at 1:40 PM, Geoff Hendrey <ghendrey@decarta.com>
> > > wrote:
> > > > In the master logs, I am seeing "regions in transition timed out" and
> > > > "region has been PENDING_CLOSE for too long, running forced unasign".
> > > > Both of these log messages occur at INFO level, so I assume they are
> > > > innocuous. Should I be concerned?
> > > >
> > > >
> > > >
> > > > -geoff
> > > >
> > > >
> >

Mime
View raw message