hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: WrongRegionException: How do I recover?
Date Fri, 06 Nov 2009 00:07:42 GMT
Can we have the factory09 datanode log too?  (Sorry, need all the pieces
because not much info available when running at INFO level especially
debugging stuff like this).
St.Ack


On Thu, Nov 5, 2009 at 3:56 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> Today on the IRC channel we fixed it with Joost using Stack's tool in
> HBASE-1867. This was caused by a file going missing in the META table
> and we are still investigating why it happened.
>
> So Joost, could you send us your NN's log so we can grep for the file
> names?
>
> Thx,
>
> J-D
>
> On Thu, Nov 5, 2009 at 11:08 AM, Joost Ouwerkerk <joost@openplaces.org>
> wrote:
> > Is there a way to rebuild the META?  I'm really hoping there's no data
> loss
> > here, and it's just a question of META being out of sync with data...
> > jo
> >
> > On Wed, Nov 4, 2009 at 7:07 PM, Joost Ouwerkerk <joost@openplaces.org
> >wrote:
> >
> >> I investigated following your guidance, Stack.  Unfortunately I am not
> >> seeing evidence of double assignment. It looks more like a case of
> missing
> >> assignment.  There appear to be key ranges that are not represented in
> the
> >> .META. table.  So, I have a region that handles keys AAA to BBB, and the
> >> next region handles DDD to EEE.  Now when I try to access key CCC, I get
> >> routed to the region that handles AAA to BBB, presumably because my key
> is
> >> after AAA and before DDD.  Then HRegion.checkRow fails because the
> requested
> >> key is outside the region's range.
> >>
> >> Consider this error:
> >>
> >> org.apache.hadoop.hbase.regionserver.WrongRegionException:
> >> org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row
> out
> >> of range for HRegion
> >>
> crawled_pages,r:http:\x2F\x2Fcom.xxx.yyy\x2Frestaurants\x2Fbasil-in-the-grove,
> >> startKey
> >> ='r:http:\x2F\x2Fcom.xxx.yyy\x2Frestaurants\x2Fbasil-in-the-grove',
> >> getEndKey()
> >> ='r:http:\x2F\x2Fcom.xxx.yyy\x2Frestaurants\x2Feast-broward',
> >> row
> >> ='r:http:\x2F\x2Fcom.xxx.yyy\x2Frestaurants\x2Fhavana-hideout'
> >>
> >> As the error points out, the requested row is outside the range for the
> >> region.  In the .META. table, the next region starts at
> >> 'r:http:\x2F\x2Fcom.xxx.yyy\x2Frestaurants\x2Fpashas-3'.  The request
> row
> >> falls after one region's End key, and before the next region's Start
> key.
> >>
> >> jo
> >>
> >>
> >> On Wed, Nov 4, 2009 at 4:56 PM, stack <stack@duboce.net> wrote:
> >>
> >>> Meta is giving out the wrong address for a region?  Do a scan of .META.
> >>>  It
> >>> might be easier dumping the scan into a file so you can grep around:
> >>>
> >>> echo "scan '.META.'" | ./bin/hbase shell --format-width=300 &>
> >>> /tmp/meta.txt
> >>>
> >>> Grep in here for the region that contains the row you are looking for.
> >>>  What
> >>> does it have for info:server?  Go to that regionserver (UI or log).  Is
> it
> >>> carrying the region?  If not, thats what the WRE is about.
> >>>
> >>> For same region, grep its name in master log (hopefully you have DEBUG
> >>> enabled).
> >>>
> >>> Whats its history?  Could it have been assigned to one server and then
> >>> another?
> >>>
> >>> If so, close the region in both places.  Type 'tools' in the shell to
> see
> >>> doc. on "close_region" command.  You can pass it server to pass the
> close
> >>> message to.  Close in both places.
> >>>
> >>> If its a double-assignment issue, our name for above phenomeon, suggest
> >>> you
> >>> upgrade to 0.20.1.  It has at least one pointed fix for this scenario
> >>> (HBASE-1878).
> >>>
> >>> St.Ack
> >>>
> >>>
> >>> On Wed, Nov 4, 2009 at 12:35 PM, Joost Ouwerkerk <joost@openplaces.org
> >>> >wrote:
> >>>
> >>> > HBase has started throwing WrongRegionExceptions at me when trying
to
> >>> > access
> >>> > certain regions.  I'm guessing that the META table has somehow gone
> out
> >>> of
> >>> > sync with reality.  I've tried compacting and I've tried restarting,
> but
> >>> > the
> >>> > problem does not go away.  The errors are always on the same regions.
> >>>  Has
> >>> > anyone else seen this and succeeded at getting their table back into
> >>> > working
> >>> > order?
> >>> >
> >>> > *Example get:*
> >>> >
> >>> > org.apache.hadoop.hbase.regionserver.WrongRegionException:
> >>> > org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested
> row
> >>> > out
> >>> > of range for HRegion
> >>> >
> >>> >
> >>>
> crawled_pages,r:http:\x2F\x2Fcom.xxxx.yyyy\x2Frestaurants\x2Fall-areas\x2Fbeverly-hills\x2Fall-cuisines\x2Ftags\x2Flunch\x2F2\x2F,1256932686084,
> >>> >
> >>> >
> >>>
> startKey='r:http:\x2F\x2Fcom.xxxx.yyyy\x2Frestaurants\x2Fall-areas\x2Fbeverly-hills\x2Fall-cuisines\x2Ftags\x2Flunch\x2F2\x2F',
> >>> >
> >>> >
> >>>
> getEndKey()='r:http:\x2F\x2Fcom.xxxx.yyyy\x2Frestaurants\x2Fall-areas\x2Fhermosa-beach\x2Fall-cuisines\x2Ftags\x2Foutdoor-dining\x2F',
> >>> >
> >>> >
> >>>
> row='r:http:\x2F\x2Fcom.xxxx.yyyy\x2Frestaurants\x2Fall-areas\x2Finglewood\x2Fall-cuisines\x2F'
> >>> >    at
> >>> >
> org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1522)
> >>> >    at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1554)
> >>> >    at
> >>> >
> org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1622)
> >>> >    at
> >>> org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:2278)
> >>> >    at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1785)
> >>> >    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
> >>> >    at
> >>> >
> >>> >
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >>> >    at
> >>> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> >>> >    at
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> >>> >
> >>> > *Example put:
> >>> > *
> >>> > put 'crawled_pages','r:
> >>> > http://com.xxxx.yyyy/restaurants/all-areas/inglewood/all-cuisines/',
> >>> > 'curi:test','test'
> >>> > NativeException:
> >>> org.apache.hadoop.hbase.client.RetriesExhaustedException:
> >>> > Trying to contact region server Some server, retryOnlyOne=true,
> index=0,
> >>> > islastrow=true, tries=4, numtries=5, i=0, listsize=1,
> >>> >
> >>> >
> >>>
> region=crawled_pages,r:http:\x2F\x2Fcom.xxxx.yyyy\x2Frestaurants\x2Fall-areas\x2Fbeverly-hills\x2Fall-cuisines\x2Ftags\x2Flunch\x2F2\x2F,1256932686084
> >>> > for region
> >>> >
> >>> >
> >>>
> crawled_pages,r:http:\x2F\x2Fcom.xxxx.yyyy\x2Frestaurants\x2Fall-areas\x2Fbeverly-hills\x2Fall-cuisines\x2Ftags\x2Flunch\x2F2\x2F,1256932686084,
> >>> > row
> >>> >
> >>> >
> >>>
> 'r:http:\x2F\x2Fcom.xxxx.yyyy\x2Frestaurants\x2Fall-areas\x2Finglewood\x2Fall-cuisines\x2F',
> >>> > but failed after 5 attempts.
> >>> > Exceptions:
> >>> >
> >>> >    from
> org/apache/hadoop/hbase/client/HConnectionManager.java:1119:in
> >>> > `process'
> >>> >    from
> org/apache/hadoop/hbase/client/HConnectionManager.java:1200:in
> >>> > `processBatchOfRows'
> >>> >    from org/apache/hadoop/hbase/client/HTable.java:605:in
> `flushCommits'
> >>> >    from org/apache/hadoop/hbase/client/HTable.java:470:in `put'
> >>> >    from org/apache/hadoop/hbase/client/HTable.java:1761:in `commit'
> >>> >    from org/apache/hadoop/hbase/client/HTable.java:1742:in `commit'
> >>> >    from sun/reflect/NativeMethodAccessorImpl.java:-2:in `invoke0'
> >>> >    from sun/reflect/NativeMethodAccessorImpl.java:39:in `invoke'
> >>> >    from sun/reflect/DelegatingMethodAccessorImpl.java:25:in `invoke'
> >>> >    from java/lang/reflect/Method.java:597:in `invoke'
> >>> >    from org/jruby/javasupport/JavaMethod.java:298:in
> >>> > `invokeWithExceptionHandling'
> >>> >    from org/jruby/javasupport/JavaMethod.java:259:in `invoke'
> >>> >    from org/jruby/java/invokers/InstanceMethodInvoker.java:44:in
> `call'
> >>> >    from org/jruby/runtime/callsite/CachingCallSite.java:110:in `call'
> >>> >    from org/jruby/ast/CallOneArgNode.java:57:in `interpret'
> >>> >    from org/jruby/ast/NewlineNode.java:104:in `interpret'
> >>> >
> >>>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message