hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
Date Tue, 06 Nov 2012 05:59:57 GMT
Could you attach the master logs at this time
2012-11-05 00:24:55?

Regards
Ram

On Tue, Nov 6, 2012 at 11:15 AM, lars hofhansl <lhofhansl@yahoo.com> wrote:

> Took a brief look through all SPLIT related commits since 0.94.0... Found
> these:
>
> HBASE-6854 *
> HBASE-6713
> HBASE-6329 *
>
> HBASE-6088
>
> HBASE-5986
> HBASE-6070 *
>
>
> The ones marked with * are (IMHO) more likely to be related.
>
> -- Lars
>
> ________________________________
> From: Matt Corgan <mcorgan@hotpads.com>
> To: dev <dev@hbase.apache.org>; lars hofhansl <lhofhansl@yahoo.com>
> Sent: Monday, November 5, 2012 9:28 PM
> Subject: Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
>
> Yeah - we were running .94.0 since it came out but never saw it there.
> I'll keep trying to narrow it down.  The only harm it's causing is log
> spam and failing to move daughters to a new regionserver, which are
> definitely problems, but it's not bringing down the cluster.
>
>
> On Mon, Nov 5, 2012 at 9:17 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
>
> > So it seems you can repeat this to some extend in 0.94.2, but you have
> > never seen this before?
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Matt Corgan <mcorgan@hotpads.com>
> > To: dev <dev@hbase.apache.org>
> > Sent: Monday, November 5, 2012 9:10 PM
> > Subject: Re: infinite loop of RS_ZK_REGION_SPLIT on .94.2
> >
> > It happened in this new table that I have all the logs for.  The region
> in
> > question this time is 6839663e4f8f79be3d7469784c21cbc2, and the first
> trace
> > of this region is on the regionserver with the "Intantiated tableName..."
> > message
> >
> > 2012-11-05 22:24:21,162 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > Instantiated
> >
> >
> StatAreaModelLink,\x00\x00\x07\xD9\x00\x00\x00\x0C\x00\x00\x00\x004H\xC4\xB5\x00\x00\x00\x02\x00\x00\x00\x05\x00\x00\x00\x00G.l\x9B,1352172257535.6839663e4f8f79be3d74
> > 9784c21cbc2.
> >
> > I also know this region came from a recent split, but I can't find any
> log
> > messages show the parent finishing the split that created this daughter
> > region.  So my guess now is that the split is actually finishing and
> > letting clients continue to write data, but something is failing to print
> > the log line and correctly tell the master about the new region.
> >
> > I've noticed that these regions are showing up on the clients in calls to
> > HTable.getRegionLocations(), so the clients continue to function, but if
> I
> > call HBaseAdmin.move() i get an UnknownRegionException.
> >
> >
> > On Mon, Nov 5, 2012 at 7:07 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > I think what Matt encountered was this:
> > > https://issues.apache.org/jira/browse/HBASE-7101
> > >
> > > On Mon, Nov 5, 2012 at 5:33 PM, Matt Corgan <mcorgan@hotpads.com>
> wrote:
> > >
> > > > Beats me - i can't find any record of it before that.
> > > >
> > > > I'm importing data into another table now.  I disabled/enabled the
> > table
> > > > first to make sure we have the original 4 region locations logged
> > > > everywhere.  Will report back...
> > > >
> > > >
> > > > On Mon, Nov 5, 2012 at 2:15 PM, Jean-Daniel Cryans <
> > jdcryans@apache.org
> > > > >wrote:
> > > >
> > > > > Wow, and can you figure how this happened?
> > > > >
> > > > > 2012-11-05 00:24:54,538 DEBUG
> > > > > org.apache.hadoop.hbase.regionserver.HRegion: Instantiated
> > > > >
> > > > >
> > > >
> > >
> >
> ActiveListingRecord16,\x86\x07\xDC\x03\x17RealtyTrac\x0044737383,1352093084141.22f8fa73d8af837410ca270f344f6bb8.
> > > > >
> > > > > Split? Else how did the master told that RS to open a region that
> > > doesn't
> > > > > exist?
> > > > >
> > > > > On Mon, Nov 5, 2012 at 2:02 PM, Matt Corgan <mcorgan@hotpads.com>
> > > wrote:
> > > > > > Here's one from last night with the master logs at the bottom.
> > > > > > http://pastebin.com/cSdMbA2a
> > > > > >
> > > > > > I don't see that region in the master logs for 4 days back.
 I'm
> > > going
> > > > to
> > > > > > import some new data from scratch soon and will be sure to keep
> all
> > > the
> > > > > > master logs.
> > > > > >
> > > > > >
> > > > > > On Mon, Nov 5, 2012 at 9:52 AM, Jean-Daniel Cryans <
> > > > jdcryans@apache.org
> > > > > >wrote:
> > > > > >
> > > > > >> This reminds me a lot of
> > > > > https://issues.apache.org/jira/browse/HBASE-4792
> > > > > >>
> > > > > >> What I'd like to see is the log from the first time the
master
> > > > > >> receives the split message. I guess it says the region doesn't
> > exist
> > > > > >> anymore because the split was processed already in the master
> but
> > > > > >> there's a failure mode similar to 4792.
> > > > > >>
> > > > > >> I saw this on another cluster last week but the logs were
> rolling
> > > > > >> based on size so the original split message was lost.
> > > > > >>
> > > > > >> J-D
> > > > > >>
> > > > > >> On Sun, Nov 4, 2012 at 5:16 PM, Matt Corgan <
> mcorgan@hotpads.com>
> > > > > wrote:
> > > > > >> > I've had a few successful splits today without any
errors.
> I'll
> > > > turn
> > > > > up
> > > > > >> > the importer speed tomorrow to start stressing the
cluster
> more.
> > > > > >> >
> > > > > >> >
> > > > > >> > On Sun, Nov 4, 2012 at 2:07 PM, Ted Yu <yuzhihong@gmail.com>
> > > wrote:
> > > > > >> >
> > > > > >> >> Matt:
> > > > > >> >> Was there any region server crash before you noticed
the
> > repeated
> > > > log
> > > > > >> for
> > > > > >> >> bc62a8a72124a4ba3f6b9f302587903c ?
> > > > > >> >>
> > > > > >> >> I wonder if the following JIRA might be related:
> > > > > >> >> HBASE-7083 SSH#fixupDaughter should force re-assign
missing
> > > > daughter
> > > > > >> >>
> > > > > >> >> Thanks
> > > > > >> >>
> > > > > >> >> On Sat, Nov 3, 2012 at 7:27 PM, Matt Corgan <
> > mcorgan@hotpads.com
> > > >
> > > > > >> wrote:
> > > > > >> >>
> > > > > >> >> > Strangely, I don't see any record of that
region in the
> > master
> > > > > before
> > > > > >> >> what
> > > > > >> >> > I already pasted even though I have logs back
to 10/30.
> Next
> > > > time
> > > > > it
> > > > > >> >> > happens I'll gather a full log record and
try to debug
> while
> > > it's
> > > > > >> >> > occurring.
> > > > > >> >> >
> > > > > >> >> >
> > > > > >> >> > On Sat, Nov 3, 2012 at 7:10 PM, rajesh babu
chintaguntla <
> > > > > >> >> > chrajeshbabu32@gmail.com> wrote:
> > > > > >> >> >
> > > > > >> >> > > Hi Matt,
> > > > > >> >> > > can you paste some more master logs of
region
> > > > > >> >> > > bc62a8a72124a4ba3f6b9f30258790 before
split.
> > > > > >> >> > > I think Its not problem with splitting.
> > > > > >> >> > > We are getting
> > > > > >> >> > >       LOG.warn("Region " + encodedName
+ " not found on
> > > server
> > > > "
> > > > > +
> > > > > >> >> > > serverName +
> > > > > >> >> > >         "; failed processing");
> > > > > >> >> > > this log means no entry in servers map(not
fully
> assigned).
> > > > > >> >> > >     Set<HRegionInfo> hris = this.servers.get(sn);
> > > > > >> >> > >     HRegionInfo foundHri = null;
> > > > > >> >> > >     for (HRegionInfo hri: hris) {
> > > > > >> >> > >       if (hri.getEncodedName().equals(encodedName))
{
> > > > > >> >> > >         foundHri = hri;
> > > > > >> >> > >         break;
> > > > > >> >> > >       }
> > > > > >> >> > >     }
> > > > > >> >> > >     return foundHri;
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > >
> > > > > >> >> > > On Sun, Nov 4, 2012 at 6:07 AM, lars
hofhansl <
> > > > > lhofhansl@yahoo.com>
> > > > > >> >> > wrote:
> > > > > >> >> > >
> > > > > >> >> > > > CC'ing dev list...
> > > > > >> >> > > >
> > > > > >> >> > > > Is anybody aware of any changes
that went in recently
> > that
> > > > > could
> > > > > >> >> cause
> > > > > >> >> > > > this?
> > > > > >> >> > > > I looked around a bit, but could
not find anything
> > obvious.
> > > > > >> >> > > >
> > > > > >> >> > > > -- Lars
> > > > > >> >> > > >
> > > > > >> >> > > >
> > > > > >> >> > > >
> > > > > >> >> > > > ________________________________
> > > > > >> >> > > >  From: Matt Corgan <mcorgan@hotpads.com>
> > > > > >> >> > > > To: user <user@hbase.apache.org>
> > > > > >> >> > > > Sent: Saturday, November 3, 2012
5:27 PM
> > > > > >> >> > > > Subject: Re: infinite loop of RS_ZK_REGION_SPLIT
on
> .94.2
> > > > > >> >> > > >
> > > > > >> >> > > > I think the cluster is ok without
running hbck, as
> > > restarting
> > > > > the
> > > > > >> >> > > > regionserver process stops the warnings
and everything
> > > looks
> > > > ok
> > > > > >> >> > > otherwise.
> > > > > >> >> > > >
> > > > > >> >> > > > here's the regionserver right after
the split happens:
> > > > > >> >> > > > ------------------------
> > > > > >> >> > > > 2012-11-01 22:45:28,726 DEBUG
> > > > > >> >> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > >> >> > > > regionserver:60020-0x13ab46479832953
Attempting to
> > > transition
> > > > > node
> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c
from *RS_ZK_R*
> > > > > >> >> > > > *EGION_SPLITTING to RS_ZK_REGION_SPLIT*
> > > > > >> >> > > > 2012-11-01 22:45:28,730 DEBUG
> > > > > >> >> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > >> >> > > > regionserver:60020-0x13ab46479832953
Successfully
> > > > transitioned
> > > > > >> node
> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c
from RS_ZK_
> > > > > >> >> > > > REGION_SPLITTING to RS_ZK_REGION_SPLIT
> > > > > >> >> > > > 2012-11-01 22:45:28,730 DEBUG
> > > > > >> >> > > > org.apache.hadoop.hbase.regionserver.SplitTransaction:
> > > Still
> > > > > >> waiting
> > > > > >> >> on
> > > > > >> >> > > the
> > > > > >> >> > > > master to process the split for
> > > > > bc62a8a72124a4ba3f6b9f302587903c
> > > > > >> >> > > > 2012-11-01 22:45:28,832 DEBUG
> > > > > >> >> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > >> >> > > > regionserver:60020-0x13ab46479832953
Attempting to
> > > transition
> > > > > node
> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c
from RS_ZK_R
> > > > > >> >> > > > EGION_SPLIT to RS_ZK_REGION_SPLIT
> > > > > >> >> > > > 2012-11-01 22:45:28,837 DEBUG
> > > > > >> >> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > >> >> > > > regionserver:60020-0x13ab46479832953
Successfully
> > > > transitioned
> > > > > >> node
> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c
from RS_ZK_
> > > > > >> >> > > > REGION_SPLIT to RS_ZK_REGION_SPLIT
> > > > > >> >> > > > -----------------------------
> > > > > >> >> > > >
> > > > > >> >> > > > The "transitioned node from RS_ZK_REGION_SPLIT
to
> > > > > >> RS_ZK_REGION_SPLIT"
> > > > > >> >> > > > continues for 15 or so hours and
finally settles
> without
> > > > manual
> > > > > >> >> > > > intervention with these regionserver
log messages:
> > > > > >> >> > > > -----------------------
> > > > > >> >> > > > 2012-11-02 13:55:00,906 DEBUG
> > > > > >> >> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > >> >> > > > regionserver:60020-0x13ab46479832953
Attempting to
> > > transition
> > > > > >> node *
> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c*
from
> RS_ZK_REGION_SPLIT
> > > to
> > > > > >> >> > > > RS_ZK_REGION_SPLIT
> > > > > >> >> > > >
> > > > > >> >> > > > 2012-11-02 13:55:00,916 DEBUG
> > > > > >> >> > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > >> >> > > > regionserver:60020-0x13ab46479832953
Successfully
> > > > transitioned
> > > > > >> node *
> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c*
from
> RS_ZK_REGION_SPLIT
> > > to
> > > > > >> >> > > > RS_ZK_REGION_SPLIT
> > > > > >> >> > > >
> > > > > >> >> > > > 2012-11-02 13:55:00,916 INFO
> > > > > >> >> > > > org.apache.hadoop.hbase.regionserver.SplitRequest:
> Region
> > > > > split,
> > > > > >> META
> > > > > >> >> > > > updated, and report to master.
> > > > > >> >> > > >
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> Parent=ActiveListingRecord16,\x83\x07\xDC\x07\x01Obeo\x00690461,1351816858693.
> > > > > >> >> > > > *bc62a8a72124a4ba3f6b9f302587903c*.,
new regions:
> > > > > >> >> > > >
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> ActiveListingRecord16,\x83\x07\xDC\x07\x01Obeo\x00690461,1351824327023.22c3fa48d17aa7312ca53566c680f0fd.,
> > > > > >> >> > > >
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> ActiveListingRecord16,\x83\x07\xDC\x07\x11WebsiteIDX\x009024215,1351824327023.b0e0a488c711e5c7f74ee6198a9755a2..
> > > > > >> >> > > > Split took 15hrs, 9mins, 33sec
> > > > > >> >> > > >
> > > > > >> >> > > > 2012-11-02 13:55:00,945 INFO
> > > > > >> >> > > > org.apache.hadoop.hbase.regionserver.SplitRequest:
> Region
> > > > > split,
> > > > > >> META
> > > > > >> >> > > > updated, and report to master.
> > > > > >> >> > > >
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> Parent=ActiveListingRecord16,\x83\x07\xDC\x06\x0EThreeWide\x00SWMRIC-11001540,1351790329631.f720436a6f8fd412d76fe3255f24e3b3.,
> > > > > >> >> > > > new regions:
> > > > > >> >> > > >
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> ActiveListingRecord16,\x83\x07\xDC\x06\x0EThreeWide\x00SWMRIC-11001540,1351816858693.2880cf893175d2a852947c63ee8554a3.,
> > > > > >> >> > > >
> > > > > >> >> >
> > > > > >>
> > > >
> > ActiveListingRecord16,\x83\x07\xDC\x07\x01Obeo\x00690461,1351816858693.*
> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c*..
Split took 17hrs,
> > > 14mins,
> > > > > 2sec
> > > > > >> >> > > > ----------------------
> > > > > >> >> > > >
> > > > > >> >> > > > And the master finally logs this:
> > > > > >> >> > > > -----------------------
> > > > > >> >> > > > 2012-11-02 13:55:00,783 WARN
> > > > > >> >> > > > org.apache.hadoop.hbase.master.AssignmentManager:
> Region
> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c
not found on server
> > > > > >> >> > > > HadoopNode162.hotpads.srv,60020,1351788248279;
failed
> > > > > processing
> > > > > >> >> > > >
> > > > > >> >> > > > 2012-11-02 13:55:00,783 WARN
> > > > > >> >> > > > org.apache.hadoop.hbase.master.AssignmentManager:
> > Received
> > > > > SPLIT
> > > > > >> for
> > > > > >> >> > > region
> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c
from server
> > > > > >> >> > > > HadoopNode162.hotpads.srv,60020,1351788248279
but it
> > > doesn't
> > > > > exist
> > > > > >> >> > > anymore,
> > > > > >> >> > > > probably already processed its split
> > > > > >> >> > > > ------------------------
> > > > > >> >> > > >
> > > > > >> >> > > > I can't find any evidence that the
region left the node
> > > > during
> > > > > >> that
> > > > > >> >> > time.
> > > > > >> >> > > > I'll have to catch it in action
next time and see what
> > the
> > > > > region
> > > > > >> is
> > > > > >> >> up
> > > > > >> >> > > to
> > > > > >> >> > > > during the problem period.
> > > > > >> >> > > >
> > > > > >> >> > > > What does it mean that it successfully
transitioned
> from
> > > > SPLIT
> > > > > to
> > > > > >> >> > SPLIT?
> > > > > >> >> > > > Is that a valid transition?
> > > > > >> >> > > >
> > > > > >> >> > > > Matt
> > > > > >> >> > > >
> > > > > >> >> > > >
> > > > > >> >> > > >
> > > > > >> >> > > > On Sat, Nov 3, 2012 at 2:55 PM,
Ted Yu <
> > > yuzhihong@gmail.com>
> > > > > >> wrote:
> > > > > >> >> > > >
> > > > > >> >> > > > > Matt:
> > > > > >> >> > > > > From the following we can see
that region
> > > > > >> >> > > > bc62a8a72124a4ba3f6b9f302587903c
> > > > > >> >> > > > > cannot be found:
> > > > > >> >> > > > >
> > > > > >> >> > > > > 2012-11-02 00:00:02,909 DEBUG
> > > > > >> >> > > > > org.apache.hadoop.hbase.master.AssignmentManager:
> > > Handling
> > > > > >> >> > > > > transition=RS_ZK_REGION_SPLIT,
> > > > > >> >> > > > > server=HadoopNode162.hotpads.srv,60020,1351788248279,
> > > > > >> >> > > > > region=bc62a8a72124a4ba3f6b9f302587903c
> > > > > >> >> > > > > 2012-11-02 00:00:02,909 WARN
> > > > > >> >> > > > > org.apache.hadoop.hbase.master.AssignmentManager:
> > Region
> > > > > >> >> > > > > bc62a8a72124a4ba3f6b9f302587903c
*not found on server
> > > > > >> >> > > > > HadoopNode162.hotpads*.srv,60020,1351788248279;failed
> > > > > >> >> > > > > processing
> > > > > >> >> > > > > 2012-11-02 00:00:02,909 WARN
> > > > > >> >> > > > > org.apache.hadoop.hbase.master.AssignmentManager:
> > > Received
> > > > > SPLIT
> > > > > >> >> for
> > > > > >> >> > > > region
> > > > > >> >> > > > > bc62a8a72124a4ba3f6b9f302587903c
from server
> > > > > >> >> > > > > HadoopNode162.hotpads.srv,60020,1351788248279
but it
> > > > doesn't
> > > > > >> exist
> > > > > >> >> > > > anymore,
> > > > > >> >> > > > > probably already processed
its split
> > > > > >> >> > > > >
> > > > > >> >> > > > > Have you run hbck to repair
your cluster ?
> > > > > >> >> > > > >
> > > > > >> >> > > > > Thanks
> > > > > >> >> > > > >
> > > > > >> >> > > > > On Sat, Nov 3, 2012 at 2:29
PM, Matt Corgan <
> > > > > >> mcorgan@hotpads.com>
> > > > > >> >> > > wrote:
> > > > > >> >> > > > >
> > > > > >> >> > > > > > Here's a sample of the
master's logs from
> yesterday.
> > > >  It's
> > > > > not
> > > > > >> >> > > > correlated
> > > > > >> >> > > > > > exactly with the other
pastebin log, but there's
> 3GB
> > of
> > > > > this
> > > > > >> from
> > > > > >> >> > > > > > yesterday: http://pastebin.com/wP2rNN1t
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > I'm am pushing the cluster
a bit with importing
> data
> > so
> > > > > >> testing
> > > > > >> >> the
> > > > > >> >> > > > split
> > > > > >> >> > > > > > code harder than normal.
 The regions are 500-1GB
> > > > gzipped.
> > > > >  I
> > > > > >> can
> > > > > >> >> > > look
> > > > > >> >> > > > > into
> > > > > >> >> > > > > > it more but trying to
figure out what to look for.
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > Thanks Ted,
> > > > > >> >> > > > > > Matt
> > > > > >> >> > > > > >
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > On Sat, Nov 3, 2012 at
2:03 PM, Ted Yu <
> > > > > yuzhihong@gmail.com>
> > > > > >> >> > wrote:
> > > > > >> >> > > > > >
> > > > > >> >> > > > > > > Matt:
> > > > > >> >> > > > > > > This is the method
which made the logging:
> > > > > >> >> > > > > > >   private static
int
> > tickleNodeSplit(ZooKeeperWatcher
> > > > > zkw,
> > > > > >> >> > > > > > >       HRegionInfo
parent, HRegionInfo a,
> > HRegionInfo
> > > b,
> > > > > >> >> > ServerName
> > > > > >> >> > > > > > > serverName,
> > > > > >> >> > > > > > >       final int znodeVersion)
> > > > > >> >> > > > > > >   throws KeeperException,
IOException {
> > > > > >> >> > > > > > >     byte [] payload
= Writables.getBytes(a, b);
> > > > > >> >> > > > > > >     return ZKAssign.transitionNode(zkw,
parent,
> > > > > serverName,
> > > > > >> >> > > > > > >       EventType.RS_ZK_REGION_SPLIT,
> > > > > >> >> EventType.RS_ZK_REGION_SPLIT,
> > > > > >> >> > > > > > >       znodeVersion,
payload);
> > > > > >> >> > > > > > >   }
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > transitionZKNode()
calls tickleNodeSplit() when
> > > waiting
> > > > > for
> > > > > >> >> > master
> > > > > >> >> > > to
> > > > > >> >> > > > > > split
> > > > > >> >> > > > > > > the region. Obviously
something caused the master
> > not
> > > > > able
> > > > > >> to
> > > > > >> >> > > split.
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > How large is the
region ?
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > Can you pastebin
master log for that period of
> > time ?
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > Thanks
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > On Sat, Nov 3, 2012
at 1:54 PM, Matt Corgan <
> > > > > >> >> mcorgan@hotpads.com
> > > > > >> >> > >
> > > > > >> >> > > > > wrote:
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > > > > We upgraded
from .94.0 to .94.2 last week and
> > have
> > > > > >> started to
> > > > > >> >> > > > > encounter
> > > > > >> >> > > > > > > > infinite loops
of region-transition on splits.
> > I'm
> > > > not
> > > > > >> sure
> > > > > >> >> > yet
> > > > > >> >> > > if
> > > > > >> >> > > > > > it's
> > > > > >> >> > > > > > > > all splits nor
if it's related to load.
> Solution
> > > so
> > > > > far
> > > > > >> has
> > > > > >> >> > been
> > > > > >> >> > > > to
> > > > > >> >> > > > > > > > restart the
regionserver process.
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > log snippet:
> > > > > >> >> > > > > > > > http://pastebin.com/LpienZ7B
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > It's repeating
these two lines:
> > > > > >> >> > > > > > > > 2012-11-02 01:35:33,312
DEBUG
> > > > > >> >> > > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > >> >> > > > > > > > regionserver:60020-0x13ab46479832b76
Attempting
> > to
> > > > > >> transition
> > > > > >> >> > > node
> > > > > >> >> > > > > > > > cf3e9bc069e1888983c06dc8e053ffcf
from
> > > > > RS_ZK_REGION_SPLIT
> > > > > >> to
> > > > > >> >> > > > > > > > RS_ZK_REGION_SPLIT
> > > > > >> >> > > > > > > > 2012-11-02 01:35:33,364
DEBUG
> > > > > >> >> > > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > >> >> > > > > > > > regionserver:60020-0x13ab46479832b76
> Successfully
> > > > > >> >> transitioned
> > > > > >> >> > > node
> > > > > >> >> > > > > > > > cf3e9bc069e1888983c06dc8e053ffcf
from
> > > > > RS_ZK_REGION_SPLIT
> > > > > >> to
> > > > > >> >> > > > > > > > RS_ZK_REGION_SPLIT
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > with the occasional:
> > > > > >> >> > > > > > > > 2012-11-02 01:35:34,476
DEBUG
> > > > > >> >> > > > > > > >
> > > > org.apache.hadoop.hbase.regionserver.SplitTransaction:
> > > > > >> Still
> > > > > >> >> > > > waiting
> > > > > >> >> > > > > on
> > > > > >> >> > > > > > > the
> > > > > >> >> > > > > > > > master to process
the split for
> > > > > >> >> > cf3e9bc069e1888983c06dc8e053ffcf
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > Should the region
transition from
> > > RS_ZK_REGION_SPLIT
> > > > to
> > > > > >> >> itself?
> > > > > >> >> > >  It
> > > > > >> >> > > > > > looks
> > > > > >> >> > > > > > > > wrong, but I'm
not familiar with the code at
> all.
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > > > Thanks,
> > > > > >> >> > > > > > > > Matt
> > > > > >> >> > > > > > > >
> > > > > >> >> > > > > > >
> > > > > >> >> > > > > >
> > > > > >> >> > > > >
> > > > > >> >> > > >
> > > > > >> >> > >
> > > > > >> >> >
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message