hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: MemStoreFlusher and my initial experience with HBase 0.90.1 rc0
Date Tue, 15 Feb 2011 01:23:01 GMT
Looks like it's just in the process of flushing something:

"regionserver60020.cacheFlusher" daemon prio=10 tid=0x0000000050ce3000
nid=0x32a9 runnable [0x0000000043138000]
   java.lang.Thread.State: RUNNABLE
        at
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:797)
        at
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:479)
        - locked <0x000000072aecd6b8> (a java.lang.Object)
        at
org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:448)
        at
org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:81)
        at
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1508)
        at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:967)
        at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:894)
        at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:846)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:386)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:194)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:214)

ie this is expected behavior. Does it not go down eventually?

On Mon, Feb 14, 2011 at 5:17 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> I applied Todd's patch from HBASE-3531 locally and built hbase jar.
>
> I restarted cluster. The first flow over 200GB data succeeded.
> The second flow got stuck. So I tried to shutdown the cluster.
>
> Here is stack trace for one of the region servers that refused to go down:
> http://pastebin.com/yJQhhYp8
>
> This is particularly interesting (other threads were waiting to lock
> 0x000000070be8a028):
>
>   1. "IPC Server handler 5 on 60020" daemon prio=10 tid=0x00002aaab883c000
>   nid=0x32bb waiting on condition [0x0000000043f46000]
>   2.    java.lang.Thread.State: WAITING (parking)
>   3.         at sun.misc.Unsafe.park(Native Method)
>   4.         - parking to wait for  <0x000000070ca468a8> (a
>   java.util.concurrent.locks.ReentrantLock$NonfairSync)
>   5.         at
>   java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>   6.         at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
>   7.         at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842)
>   8.         at
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178)
>   9.         at
>
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
>   10.         at
>   java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>   11.         at
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:429)
>   12.         - locked <0x000000070be8a028> (a
>   org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
>   13.         at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:2562)
>
> Thanks for your attention.
>
> On Mon, Feb 14, 2011 at 1:12 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
> > Gotcha, OK. I think I understand what's going on then.
> > I'm guessing you have a TON of log messages like this?
> >
> > 2011-02-14 06:27:23,116 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion:
> > NOT flushing memstore for region
> >
> >
> NIGHTLYDEVGRIDSGRIDSQL-THREEGPPSPEECHCALLS-1297661454931,E\xC2\xC8\xBC\xD4\xA5\x04E\xC2\xC8\xBC\xD4\xA5\x04E\xC2\xC8\xBC\xD4\xA5\x04E\xC2\xC8\xBC\xD4\xA5\x04E\xC2\xC8\xAA,1297662719973.cd06c5f9e4a0ffcc16cb6a5e559cd5b3.,
> > flushing=false, writesEnabled=false
> >
> >
> > -Todd
> >
> > On Mon, Feb 14, 2011 at 1:08 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > The lock up happened before I shut down region server.
> > > I had to get out of that situation so that I can continue 0.90.1
> > > validation.
> > >
> > > On Mon, Feb 14, 2011 at 12:59 PM, Todd Lipcon <todd@cloudera.com>
> wrote:
> > >
> > > > Hi Ted,
> > > >
> > > > What was the reason for the region server shutdown? ie was it
> aborting
> > > > itself, or you send it a kill signal, or what?
> > > >
> > > > Still trying to understand why this happened.
> > > >
> > > > -Todd
> > > >
> > > > On Mon, Feb 14, 2011 at 10:40 AM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > > >
> > > > > Here is stack trace for one region server which didn't cleanly
> > > shutdown:
> > > > > http://pastebin.com/PEtEdi4g
> > > > >
> > > > > I noticed IPC Server handler 7 was holding lock in
> > > > reclaimMemStoreMemory().
> > > > >
> > > > > Here is related snippet from the region server's log:
> > > > > http://pastebin.com/KeXppURX
> > > > >
> > > > > I noticed region 1297662719973.cd06c5f9e4a0ffcc16cb6a5e559cd5b3 was
> > > > > splitting.
> > > > >
> > > > > Please advise whether there could be any relation between the above
> > > > snippet
> > > > > and lock up in MemStoreFlusher.
> > > > >
> > > > > Thanks
> > > > >
> > > > > On Mon, Feb 14, 2011 at 9:20 AM, Ted Yu <yuzhihong@gmail.com>
> wrote:
> > > > >
> > > > > > I disabled MSLAB.
> > > > > > My flow still couldn't make much progress.
> > > > > >
> > > > > >
> > > > > >> In this region server stack trace, I don't see
> > > > > >> MemStoreFlusher.reclaimMemStoreMemory() call:
> > > > > >> http://pastebin.com/uiBRidUa
> > > > > >>
> > > > > >>
> > > > > >> On Sun, Feb 13, 2011 at 1:14 PM, Ted Yu <yuzhihong@gmail.com>
> > > wrote:
> > > > > >>
> > > > > >>> I am using hadoop-core-0.20.2-322.jar downloaded from
Ryan's
> > repo.
> > > > > >>> FYI
> > > > > >>>
> > > > > >>>
> > > > > >>> On Sun, Feb 13, 2011 at 1:12 PM, Ted Yu <yuzhihong@gmail.com>
> > > wrote:
> > > > > >>>
> > > > > >>>> Since master server shut down, I restarted the cluster.
> > > > > >>>> The next flow over 200GB data got timed out.
> > > > > >>>>
> > > > > >>>> Here are some region server stat:
> > > > > >>>>
> > > > > >>>> request=0.0, regions=95, stores=213, storefiles=65,
> > > > > >>>> storefileIndexSize=99, memstoreSize=1311,
> compactionQueueSize=0,
> > > > > >>>> flushQueueSize=0, usedHeap=2532, maxHeap=3983,
> > > > blockCacheSize=6853968,
> > > > > >>>> blockCacheFree=828520304, blockCacheCount=0,
> > blockCacheHitCount=0,
> > > > > >>>> blockCacheMissCount=0, blockCacheEvictedCount=0,
> > > > blockCacheHitRatio=0,
> > > > > >>>> blockCacheHitCachingRatio=0
> > > > > >>>>
> > > > > >>>> request=0.0, regions=95, stores=232, storefiles=72,
> > > > > >>>> storefileIndexSize=120, memstoreSize=301,
> compactionQueueSize=0,
> > > > > >>>> flushQueueSize=0, usedHeap=1740, maxHeap=3983,
> > > > > blockCacheSize=13110928,
> > > > > >>>> blockCacheFree=822263344, blockCacheCount=712,
> > > > > blockCacheHitCount=112478,
> > > > > >>>> blockCacheMissCount=712, blockCacheEvictedCount=0,
> > > > > blockCacheHitRatio=99,
> > > > > >>>> blockCacheHitCachingRatio=99
> > > > > >>>>
> > > > > >>>> Thanks
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> On Sun, Feb 13, 2011 at 12:24 PM, Ryan Rawson <
> > ryanobjc@gmail.com
> > > > > >wrote:
> > > > > >>>>
> > > > > >>>>> every handler thread, and every reader and also
the accept
> > thread
> > > > are
> > > > > >>>>> all blocked on flushing memstore.  The handlers
get blocked,
> > then
> > > > the
> > > > > >>>>> readers also have a finite handoff queue and
they are blocked
> > and
> > > > > also
> > > > > >>>>> the accept.
> > > > > >>>>>
> > > > > >>>>> But why isnt memstore flushing?  Do you have
regionserver
> > stats?
> > > >  ie:
> > > > > >>>>> how much memstore global ram used?  That is
found on the main
> > > page
> > > > of
> > > > > >>>>> the regionserver http service, also found in
ganglia/file
> > stats.
> > > > > >>>>>
> > > > > >>>>> I havent looked at the logs yet, I'm off to
lunch now.
> > > > > >>>>>
> > > > > >>>>> -ryan
> > > > > >>>>>
> > > > > >>>>> On Sun, Feb 13, 2011 at 8:44 AM, Ted Yu <yuzhihong@gmail.com
> >
> > > > wrote:
> > > > > >>>>> > I had 3 consecutive successful runs processing
200GB data
> for
> > > > each
> > > > > >>>>> run
> > > > > >>>>> > before hitting timeout problem in the 4th
run.
> > > > > >>>>> >
> > > > > >>>>> > The 5th run couldn't proceed because master
complained:
> > > > > >>>>> >
> > > > > >>>>> > 2011-02-13 16:11:45,173 FATAL
> > > > > org.apache.hadoop.hbase.master.HMaster:
> > > > > >>>>> Failed
> > > > > >>>>> > assignment of regions to
> > > > > >>>>> > serverName=sjc1-hadoop6.sjc1.carrieriq.com
> > > ,60020,1297518996557,
> > > > > >>>>> > load=(requests=0, regions=231, usedHeap=3535,
maxHeap=3983)
> > > > > >>>>> >
> > > > > >>>>> > but sjc1-hadoop6.sjc1 claimed:
> > > > > >>>>> > 2011-02-13 16:13:32,258 DEBUG
> > > > > >>>>> > org.apache.hadoop.hbase.regionserver.HRegionServer:
No
> master
> > > > > found,
> > > > > >>>>> will
> > > > > >>>>> > retry
> > > > > >>>>> >
> > > > > >>>>> > Here is stack trace for sjc1-hadoop6.sjc1:
> > > > > >>>>> > http://pastebin.com/X8zWLXqu
> > > > > >>>>> >
> > > > > >>>>> > I didn't have chance to capture master
stack trace as
> master
> > > > exited
> > > > > >>>>> after
> > > > > >>>>> > that.
> > > > > >>>>> >
> > > > > >>>>> > I also attach master and region server
log on
> > sjc1-hadoop6.sjc1
> > > -
> > > > > >>>>> pardon me
> > > > > >>>>> > for including individual email addresses
as attachments
> > > wouldn't
> > > > go
> > > > > >>>>> through
> > > > > >>>>> > hbase.apache.org
> > > > > >>>>> >
> > > > > >>>>> > On Thu, Feb 10, 2011 at 5:05 PM, Todd Lipcon
<
> > > todd@cloudera.com>
> > > > > >>>>> wrote:
> > > > > >>>>> >>
> > > > > >>>>> >> On Thu, Feb 10, 2011 at 4:54 PM, Ted
Yu <
> > yuzhihong@gmail.com>
> > > > > >>>>> wrote:
> > > > > >>>>> >>
> > > > > >>>>> >> > Thanks for the explanation.
> > > > > >>>>> >> > Assuming the mixed class loading
is static, why did this
> > > > > situation
> > > > > >>>>> >> > develop
> > > > > >>>>> >> > after 40 minutes of heavy load
:-(
> > > > > >>>>> >> >
> > > > > >>>>> >>
> > > > > >>>>> >> You didn't hit global memstore pressure
until 40 minutes
> of
> > > > load.
> > > > > >>>>> >>
> > > > > >>>>> >> -Todd
> > > > > >>>>> >>
> > > > > >>>>> >> On Thu, Feb 10, 2011 at 4:42 PM, Ryan
Rawson <
> > > > ryanobjc@gmail.com>
> > > > > >>>>> wrote:
> > > > > >>>>> >> >
> > > > > >>>>> >> > > It's a standard linking issue,
you get one class from
> > one
> > > > > >>>>> version
> > > > > >>>>> >> > > another from another, they
are mostly compatible in
> > terms
> > > of
> > > > > >>>>> >> > > signatures (hence no exceptions)
but are subtly
> > > incompatible
> > > > > in
> > > > > >>>>> >> > > different ways. In the stack
trace you posted, the
> > > handlers
> > > > > were
> > > > > >>>>> >> > > blocked in:
> > > > > >>>>> >> > >
> > > > > >>>>> >> > >        at
> > > > > >>>>> >> > >
> > > > > >>>>> >> >
> > > > > >>>>> >> >
> > > > > >>>>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(MemStoreFlusher.java:382)
> > > > > >>>>> >> > >
> > > > > >>>>> >> > > and the thread:
> > > > > >>>>> >> > >
> > > > > >>>>> >> > > "regionserver60020.cacheFlusher"
daemon prio=10
> > > > > >>>>> tid=0x00002aaabc21e000
> > > > > >>>>> >> > > nid=0x7717 waiting for monitor
entry
> > [0x0000000000000000]
> > > > > >>>>> >> > >   java.lang.Thread.State:
BLOCKED (on object monitor)
> > > > > >>>>> >> > >
> > > > > >>>>> >> > > was idle.
> > > > > >>>>> >> > >
> > > > > >>>>> >> > > The cache flusher thread
should be flushing, and yet
> > it's
> > > > > doing
> > > > > >>>>> >> > > nothing.  This also happens
to be one of the classes
> > that
> > > > were
> > > > > >>>>> >> > > changed.
> > > > > >>>>> >> > >
> > > > > >>>>> >> > >
> > > > > >>>>> >> > >
> > > > > >>>>> >> > > On Thu, Feb 10, 2011 at 4:34
PM, Ted Yu <
> > > > yuzhihong@gmail.com>
> > > > > >>>>> wrote:
> > > > > >>>>> >> > > > Can someone comment
on my second question ?
> > > > > >>>>> >> > > > Thanks
> > > > > >>>>> >> > > >
> > > > > >>>>> >> > > > On Thu, Feb 10, 2011
at 4:25 PM, Ryan Rawson <
> > > > > >>>>> ryanobjc@gmail.com>
> > > > > >>>>> >> > wrote:
> > > > > >>>>> >> > > >
> > > > > >>>>> >> > > >> As I suspected.
> > > > > >>>>> >> > > >>
> > > > > >>>>> >> > > >> It's a byproduct
of our maven assembly process. The
> > > > process
> > > > > >>>>> could
> > > > > >>>>> >> > > >> be
> > > > > >>>>> >> > > >> fixed. I wouldn't
mind. I don't support runtime
> > > checking
> > > > of
> > > > > >>>>> jars,
> > > > > >>>>> >> > > >> there is such thing
as too much tests, and this is
> an
> > > > > example
> > > > > >>>>> of
> > > > > >>>>> >> > > >> it.
> > > > > >>>>> >> > > >> The check would
then need a test, etc, etc.
> > > > > >>>>> >> > > >>
> > > > > >>>>> >> > > >> At SU we use new
directories for each upgrade,
> > copying
> > > > the
> > > > > >>>>> config
> > > > > >>>>> >> > > >> over. With the lack
of -default.xml this is easier
> > than
> > > > > ever
> > > > > >>>>> (just
> > > > > >>>>> >> > > >> copy everything
in conf/).  With symlink switchover
> > it
> > > > > makes
> > > > > >>>>> roll
> > > > > >>>>> >> > > >> forward/back as
simple as doing a symlink
> switchover
> > or
> > > > > back.
> > > > > >>>>> I
> > > > > >>>>> >> > > >> have
> > > > > >>>>> >> > > >> to recommend this
to everyone who doesnt have a
> > > > management
> > > > > >>>>> scheme.
> > > > > >>>>> >> > > >>
> > > > > >>>>> >> > > >> On Thu, Feb 10,
2011 at 4:20 PM, Ted Yu <
> > > > > yuzhihong@gmail.com
> > > > > >>>>> >
> > > > > >>>>> >> > > >> wrote:
> > > > > >>>>> >> > > >> > hbase/hbase-0.90.1.jar
leads lib/hbase-0.90.0.jar
> > in
> > > > the
> > > > > >>>>> >> > > >> > classpath.
> > > > > >>>>> >> > > >> > I wonder
> > > > > >>>>> >> > > >> > 1. why hbase
jar is placed in two directories -
> > > 0.20.6
> > > > > >>>>> didn't use
> > > > > >>>>> >> > such
> > > > > >>>>> >> > > >> > structure
> > > > > >>>>> >> > > >> > 2. what from
lib/hbase-0.90.0.jar could be picked
> > up
> > > > and
> > > > > >>>>> why
> > > > > >>>>> >> > > >> > there
> > > > > >>>>> >> > > wasn't
> > > > > >>>>> >> > > >> > exception in
server log
> > > > > >>>>> >> > > >> >
> > > > > >>>>> >> > > >> > I think a JIRA
should be filed for item 2 above -
> > > bail
> > > > > out
> > > > > >>>>> when
> > > > > >>>>> >> > > >> > the
> > > > > >>>>> >> > > two
> > > > > >>>>> >> > > >> > hbase jars
from $HBASE_HOME and $HBASE_HOME/lib
> are
> > > of
> > > > > >>>>> different
> > > > > >>>>> >> > > >> versions.
> > > > > >>>>> >> > > >> >
> > > > > >>>>> >> > > >> > Cheers
> > > > > >>>>> >> > > >> >
> > > > > >>>>> >> > > >> > On Thu, Feb
10, 2011 at 3:40 PM, Ryan Rawson <
> > > > > >>>>> ryanobjc@gmail.com>
> > > > > >>>>> >> > > wrote:
> > > > > >>>>> >> > > >> >
> > > > > >>>>> >> > > >> >> What do
you get when you:
> > > > > >>>>> >> > > >> >>
> > > > > >>>>> >> > > >> >> ls lib/hbase*
> > > > > >>>>> >> > > >> >>
> > > > > >>>>> >> > > >> >> I'm going
to guess there is hbase-0.90.0.jar
> there
> > > > > >>>>> >> > > >> >>
> > > > > >>>>> >> > > >> >>
> > > > > >>>>> >> > > >> >>
> > > > > >>>>> >> > > >> >> On Thu,
Feb 10, 2011 at 3:25 PM, Ted Yu <
> > > > > >>>>> yuzhihong@gmail.com>
> > > > > >>>>> >> > wrote:
> > > > > >>>>> >> > > >> >> > hbase-0.90.0-tests.jar
and hbase-0.90.1.jar
> > > co-exist
> > > > > >>>>> >> > > >> >> > Would
this be a problem ?
> > > > > >>>>> >> > > >> >> >
> > > > > >>>>> >> > > >> >> > On
Thu, Feb 10, 2011 at 3:16 PM, Ryan Rawson
> > > > > >>>>> >> > > >> >> > <ryanobjc@gmail.com
> > > > > >>>>> >> > >
> > > > > >>>>> >> > > >> wrote:
> > > > > >>>>> >> > > >> >> >
> > > > > >>>>> >> > > >> >> >>
You don't have both the old and the new hbase
> > > jars
> > > > in
> > > > > >>>>> there
> > > > > >>>>> >> > > >> >> >>
do
> > > > > >>>>> >> > > you?
> > > > > >>>>> >> > > >> >> >>
> > > > > >>>>> >> > > >> >> >>
-ryan
> > > > > >>>>> >> > > >> >> >>
> > > > > >>>>> >> > > >> >> >>
On Thu, Feb 10, 2011 at 3:12 PM, Ted Yu <
> > > > > >>>>> yuzhihong@gmail.com>
> > > > > >>>>> >> > > wrote:
> > > > > >>>>> >> > > >> >> >>
> .META. went offline during second flow
> > attempt.
> > > > > >>>>> >> > > >> >> >>
>
> > > > > >>>>> >> > > >> >> >>
> The time out I mentioned happened for 1st
> and
> > > 3rd
> > > > > >>>>> attempts.
> > > > > >>>>> >> > > HBase
> > > > > >>>>> >> > > >> was
> > > > > >>>>> >> > > >> >> >>
> restarted before the 1st and 3rd attempts.
> > > > > >>>>> >> > > >> >> >>
>
> > > > > >>>>> >> > > >> >> >>
> Here is jstack:
> > > > > >>>>> >> > > >> >> >>
> http://pastebin.com/EHMSvsRt
> > > > > >>>>> >> > > >> >> >>
>
> > > > > >>>>> >> > > >> >> >>
> On Thu, Feb 10, 2011 at 3:04 PM, Stack <
> > > > > >>>>> stack@duboce.net>
> > > > > >>>>> >> > > wrote:
> > > > > >>>>> >> > > >> >> >>
>
> > > > > >>>>> >> > > >> >> >>
>> So, .META. is not online?  What happens if
> > you
> > > > use
> > > > > >>>>> shell
> > > > > >>>>> >> > > >> >> >>
>> at
> > > > > >>>>> >> > > this
> > > > > >>>>> >> > > >> >> time.
> > > > > >>>>> >> > > >> >> >>
>>
> > > > > >>>>> >> > > >> >> >>
>> Your attachement did not come across Ted.
> > >  Mind
> > > > > >>>>> >> > > >> >> >>
>> postbin'ing
> > > > > >>>>> >> > it?
> > > > > >>>>> >> > > >> >> >>
>>
> > > > > >>>>> >> > > >> >> >>
>> St.Ack
> > > > > >>>>> >> > > >> >> >>
>>
> > > > > >>>>> >> > > >> >> >>
>> On Thu, Feb 10, 2011 at 2:41 PM, Ted Yu
> > > > > >>>>> >> > > >> >> >>
>> <yuzhihong@gmail.com
> > > > > >>>>> >> > >
> > > > > >>>>> >> > > >> wrote:
> > > > > >>>>> >> > > >> >> >>
>> > I replaced hbase jar with
> hbase-0.90.1.jar
> > > > > >>>>> >> > > >> >> >>
>> > I also upgraded client side jar to
> > > > > >>>>> hbase-0.90.1.jar
> > > > > >>>>> >> > > >> >> >>
>> >
> > > > > >>>>> >> > > >> >> >>
>> > Our map tasks were running faster than
> > > before
> > > > > for
> > > > > >>>>> about
> > > > > >>>>> >> > > >> >> >>
>> > 50
> > > > > >>>>> >> > > >> minutes.
> > > > > >>>>> >> > > >> >> >>
>> However,
> > > > > >>>>> >> > > >> >> >>
>> > map tasks then timed out calling
> > > > flushCommits().
> > > > > >>>>> This
> > > > > >>>>> >> > > happened
> > > > > >>>>> >> > > >> even
> > > > > >>>>> >> > > >> >> >>
after
> > > > > >>>>> >> > > >> >> >>
>> > fresh restart of hbase.
> > > > > >>>>> >> > > >> >> >>
>> >
> > > > > >>>>> >> > > >> >> >>
>> > I don't see any exception in region
> server
> > > > logs.
> > > > > >>>>> >> > > >> >> >>
>> >
> > > > > >>>>> >> > > >> >> >>
>> > In master log, I found:
> > > > > >>>>> >> > > >> >> >>
>> >
> > > > > >>>>> >> > > >> >> >>
>> > 2011-02-10 18:24:15,286 DEBUG
> > > > > >>>>> >> > > >> >> >>
>> >
> > > > > >>>>> >> > > >> >> >>
>> >
> > > > > >>>>> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler:
> > > > > >>>>> >> > > >> Opened
> > > > > >>>>> >> > > >> >> >>
region
> > > > > >>>>> >> > > >> >> >>
>> > -ROOT-,,0.70236052 on
> sjc1-hadoop6.X.com
> > > > > >>>>> >> > ,60020,1297362251595
> > > > > >>>>> >> > > >> >> >>
>> > 2011-02-10 18:24:15,349 INFO
> > > > > >>>>> >> > > >> >> >>
>>
> > > org.apache.hadoop.hbase.catalog.CatalogTracker:
> > > > > >>>>> >> > > >> >> >>
>> > Failed verification of .META.,,1 at
> > > > > address=null;
> > > > > >>>>> >> > > >> >> >>
>> >
> > > > > org.apache.hadoop.hbase.NotServingRegionException:
> > > > > >>>>> >> > > >> >> >>
>> >
> > > > > org.apache.hadoop.hbase.NotServingRegionException:
> > > > > >>>>> >> > > >> >> >>
>> > Region
> > > > > >>>>> >> > is
> > > > > >>>>> >> > > not
> > > > > >>>>> >> > > >> >> >>
online:
> > > > > >>>>> >> > > >> >> >>
>> > .META.,,1
> > > > > >>>>> >> > > >> >> >>
>> > 2011-02-10 18:24:15,350 DEBUG
> > > > > >>>>> >> > > >> >> >>
org.apache.hadoop.hbase.zookeeper.ZKAssign:
> > > > > >>>>> >> > > >> >> >>
>> > master:60000-0x12e10d0e31e0000 Creating
> > (or
> > > > > >>>>> updating)
> > > > > >>>>> >> > > unassigned
> > > > > >>>>> >> > > >> >> node
> > > > > >>>>> >> > > >> >> >>
for
> > > > > >>>>> >> > > >> >> >>
>> > 1028785192 with OFFLINE state
> > > > > >>>>> >> > > >> >> >>
>> >
> > > > > >>>>> >> > > >> >> >>
>> > I am attaching region server (which
> didn't
> > > > > respond
> > > > > >>>>> to
> > > > > >>>>> >> > > >> >> stop-hbase.sh)
> > > > > >>>>> >> > > >> >> >>
>> jstack.
> > > > > >>>>> >> > > >> >> >>
>> >
> > > > > >>>>> >> > > >> >> >>
>> > FYI
> > > > > >>>>> >> > > >> >> >>
>> >
> > > > > >>>>> >> > > >> >> >>
>> > On Thu, Feb 10, 2011 at 10:10 AM, Stack
> > > > > >>>>> >> > > >> >> >>
>> > <stack@duboce.net>
> > > > > >>>>> >> > > >> wrote:
> > > > > >>>>> >> > > >> >> >>
>> >>
> > > > > >>>>> >> > > >> >> >>
>> >> Thats probably enough Ted.  The 0.90.1
> > > > > >>>>> >> > > >> >> >>
>> >> hbase-default.xml
> > > > > >>>>> >> > has
> > > > > >>>>> >> > > an
> > > > > >>>>> >> > > >> >> extra
> > > > > >>>>> >> > > >> >> >>
>> >> config. to enable the experimental
> > > HBASE-3455
> > > > > >>>>> feature
> > > > > >>>>> >> > > >> >> >>
>> >> but
> > > > > >>>>> >> > > you
> > > > > >>>>> >> > > >> can
> > > > > >>>>> >> > > >> >> >>
copy
> > > > > >>>>> >> > > >> >> >>
>> >> that over if you want to try playing
> with
> > > it
> > > > > (it
> > > > > >>>>> >> > > >> >> >>
>> >> defaults
> > > > > >>>>> >> > > off
> > > > > >>>>> >> > > >> so
> > > > > >>>>> >> > > >> >> >>
you'd
> > > > > >>>>> >> > > >> >> >>
>> >> copy over the config. if you wanted to
> > set
> > > it
> > > > > to
> > > > > >>>>> true).
> > > > > >>>>> >> > > >> >> >>
>> >>
> > > > > >>>>> >> > > >> >> >>
>> >> St.Ack
> > > > > >>>>> >> > > >> >> >>
>> >
> > > > > >>>>> >> > > >> >> >>
>> >
> > > > > >>>>> >> > > >> >> >>
>>
> > > > > >>>>> >> > > >> >> >>
>
> > > > > >>>>> >> > > >> >> >>
> > > > > >>>>> >> > > >> >> >
> > > > > >>>>> >> > > >> >>
> > > > > >>>>> >> > > >> >
> > > > > >>>>> >> > > >>
> > > > > >>>>> >> > > >
> > > > > >>>>> >> > >
> > > > > >>>>> >> >
> > > > > >>>>> >>
> > > > > >>>>> >>
> > > > > >>>>> >>
> > > > > >>>>> >> --
> > > > > >>>>> >> Todd Lipcon
> > > > > >>>>> >> Software Engineer, Cloudera
> > > > > >>>>> >
> > > > > >>>>> >
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Todd Lipcon
> > > > Software Engineer, Cloudera
> > > >
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message