hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: hbase on s3 and safemode
Date Wed, 07 Oct 2009 20:01:34 GMT
Sorry, I've never run HBase on top of a S3 filesystem so can't provide much insight there.

   - Andy




________________________________
From: Ananth T. Sarathy <ananth.t.sarathy@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Wed, October 7, 2009 12:59:58 PM
Subject: Re: hbase on s3 and safemode

there is all sorts of things in the bucket when I explore it.

We are going to set up .20.0 and point it to a new bucket. Any tips I should
know about to avoid something like this or data loss?

Ananth T Sarathy


On Wed, Oct 7, 2009 at 3:55 PM, Andrew Purtell <apurtell@apache.org> wrote:

> One possibility is you loaded data, but not enough to cause a flush, then
> there appeared to be some network related problem, and you killed the
> regionservers hard (-9?) while the filesystem was unavailable. This
> unfortunate string of circumstances would cause data loss. However you said
> the cluster had been running for 6 days so a major compaction (runs once
> every 24 hours) would have flushed and persisted data. Is there anything in
> the bucket? (hadoop fs -lsr ...)
>
> 0.20 is definitely the way to go, for a number of reasons.
>
>   - Andy
>
>
>
>
> ________________________________
> From: Ananth T. Sarathy <ananth.t.sarathy@gmail.com>
> To: hbase-user@hadoop.apache.org
> Sent: Wed, October 7, 2009 12:46:24 PM
> Subject: Re: hbase on s3 and safemode
>
> thanks for all the help
>
> <property>
>    <name>hbase.rootdir</name>
>    <value>s3://hbase2.s3.amazonaws.com:80/hbasedata</value>
>    <description>The directory shared by region servers.
>    Should be fully-qualified to include the filesystem to use.
>    E.g: hdfs://NAMENODE_SERVER:PORT/HBASE_ROOTDIR
>    </description>
>  </property>
>
> that's in our hbase-site.xml
>
>
> We had been running for about 6 days with new issues.  at 130 this morning
> it just crapped out.
>
> We are thinking about just moving to 20.0 and starting over.
>
> Ananth T Sarathy
>
>
> On Wed, Oct 7, 2009 at 3:41 PM, Andrew Purtell <apurtell@apache.org>
> wrote:
>
> > Did you edit hbase-site.xml such that HBase data directories are not in
> > /tmp? Maybe a silly question... but it happens sometimes.
> >
> > If your hbase.rootdir points to an HDFS filesystem, what does 'hadoop fs
> > -lsr hdfs://namenode:port/path/to/hbase/root' show?
> >
> > You said this was working before? Did you shut down and bring HBase back
> up
> > before without trouble? Is this a new install?
> >
> >   - Andy
> >
> >
> >
> >
> >
> > ________________________________
> > From: Ananth T. Sarathy <ananth.t.sarathy@gmail.com>
> > To: hbase-user@hadoop.apache.org
> > Sent: Wed, October 7, 2009 12:34:28 PM
> > Subject: Re: hbase on s3 and safemode
> >
> > ok. so we finally got the regionserver to come up (We killed all the
> > processes on the box and finally the regionserver came back up)
> > but when it did, there is no data in our tables. Though the tables are
> > there.  Any ideas where the data went or how I can get it back?
> >
> > Ananth T Sarathy
> >
> >
> > On Wed, Oct 7, 2009 at 2:46 PM, Andrew Purtell <apurtell@apache.org>
> > wrote:
> >
> > > One option is to add SYSV init scripts that on boot take the following
> > > equivalent actions:
> > >
> > >    hbase-daemon.sh start zookeeper
> > >
> > >    hbase-daemon.sh start master
> > >
> > >    hbase-daemon.sh start regionserver
> > >
> > > Set the respective init scripts to run according to host role.
> > >
> > > This presumes you have also added init scripts that start up DFS
> daemons
> > > wherever they should be, equivalents to the following:
> > >
> > >    hadoop-daemon.sh start namenode
> > >
> > >    hadoop-daemon.sh start datanode
> > >
> > >    hadoop-daemon.sh start secondarynamenode
> > >
> > > You can start everything up all at once. The respective daemons will
> wait
> > > for each others' services to become available. Ignore ZK noise in the
> > logs
> > > about connection difficulties unless they persist for minutes.
> > >
> > > If you want to try out the Cloudera Hadoop distribution for 0.20, they
> > have
> > > RPMs that will take care of all of this for you, and we have a RPM for
> > that
> > > platform that I can provide you.
> > >
> > > Do also check your network configuration.
> > >
> > >   - Andy
> > >
> > >
> > >
> > >
> > > ________________________________
> > > From: Ananth T. Sarathy <ananth.t.sarathy@gmail.com>
> > > To: hbase-user@hadoop.apache.org
> > > Sent: Wed, October 7, 2009 11:36:22 AM
> > > Subject: Re: hbase on s3 and safemode
> > >
> > > is there a way to turn my regionservers on implicitly besides
> > > start-hbase.sh?
> > > Ananth T Sarathy
> > >
> > >
> > > On Wed, Oct 7, 2009 at 2:31 PM, Andrew Purtell <apurtell@apache.org>
> > > wrote:
> > >
> > > > HBase won't leave safe mode if the regionservers cannot contact the
> > > master.
> > > > So the question is why cannot your regionservers contact the master.
> If
> > > the
> > > > regionserver processes are confirmed running, then it's a firewall or
> > AWS
> > > > Security Groups config problem most likely.
> > > >
> > > > status was a shell command added in 0.20 IIRC.
> > > >
> > > >    - Andy
> > > >
> > > >
> > > >
> > > >
> > > > ________________________________
> > > > From: Ananth T. Sarathy <ananth.t.sarathy@gmail.com>
> > > > To: hbase-user@hadoop.apache.org
> > > > Sent: Wed, October 7, 2009 11:04:03 AM
> > > > Subject: Re: hbase on s3 and safemode
> > > >
> > > > i suppose we need to, but for now it's kind of a pain because we need
> > to
> > > > coordinate our clients.
> > > >
> > > > But the problem is why was it working and all of the sudden it's
> stuck
> > in
> > > > safemode and how to can get back up?
> > > >
> > > > Ananth T Sarathy
> > > >
> > > >
> > > > On Wed, Oct 7, 2009 at 1:58 PM, stack <stack@duboce.net> wrote:
> > > >
> > > > > Can you update to 0.20.0? (Oodles of improvements).
> > > > > St.Ack
> > > > >
> > > > > On Wed, Oct 7, 2009 at 10:56 AM, Ananth T. Sarathy <
> > > > > ananth.t.sarathy@gmail.com> wrote:
> > > > >
> > > > > > I get an error
> > > > > >
> > > > > > hbase(main):001:0> status "detailed"
> > > > > > NoMethodError: undefined method `status' for #<Object:0x5585c0de>
> > > > > >        from (hbase):2
> > > > > > hbase(main):002:0> status "detailed"
> > > > > > NoMethodError: undefined method `status' for #<Object:0x5585c0de>
> > > > > >        from (hbase):3
> > > > > >
> > > > > >
> > > > > > we are running 0.19.3
> > > > > >
> > > > > > Ananth T Sarathy
> > > > > >
> > > > > >
> > > > > > On Wed, Oct 7, 2009 at 1:51 PM, stack <stack@duboce.net>
wrote:
> > > > > >
> > > > > > > This state persists even if you shutdown hbase and zk and
> > restart?
> > > > > > >
> > > > > > > In shell, do:
> > > > > > >
> > > > > > > > status "detailed"
> > > > > > >
> > > > > > > At the top there is a section which says regions in
> transistion.
> > > > > >  Anything
> > > > > > > there?
> > > > > > >
> > > > > > > St.Ack
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Oct 7, 2009 at 10:35 AM, Ananth T. Sarathy <
> > > > > > > ananth.t.sarathy@gmail.com> wrote:
> > > > > > >
> > > > > > > > Here is the log  since I started it...
> > > > > > > >
> > > > > > > > Wed Oct  7 13:27:26 EDT 2009 Starting master on
> ip-10-244-9-171
> > > > > > > > ulimit -n 1024
> > > > > > > > 2009-10-07 13:27:26,404 INFO
> > > > org.apache.hadoop.hbase.master.HMaster:
> > > > > > > > vmName=Java HotSpot(TM) 64-Bit Server VM, vmVendor=Sun
> > > Microsystems
> > > > > > Inc.,
> > > > > > > > vmVersion=14.2-b01
> > > > > > > > 2009-10-07 13:27:26,405 INFO
> > > > org.apache.hadoop.hbase.master.HMaster:
> > > > > > > > vmInputArguments=[-Xmx2000m, -XX:+HeapDumpOnOutOfMemoryError,
> > > > > > > > -Djava.io.tmpdir=/mnt/tmp,
> > > > > > > > -Dhbase.log.dir=/mnt/apps/hadoop/hbase/bin/../logs,
> > > > > > > > -Dhbase.log.file=hbase-root-master-ip-10-244-9-171.log,
> > > > > > > > -Dhbase.home.dir=/mnt/apps/hadoop/hbase/bin/..,
> > > > -Dhbase.id.str=root,
> > > > > > > > -Dhbase.root.logger=INFO,DRFA,
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> -Djava.library.path=/mnt/apps/hadoop/hbase/bin/../lib/native/Linux-amd64-64]
> > > > > > > > 2009-10-07 13:27:27,525 INFO
> > > > org.apache.hadoop.hbase.master.HMaster:
> > > > > > Root
> > > > > > > > region dir: s3://
> > > > > hbase2.s3.amazonaws.com:80/hbasedata/-ROOT-/70236052
> > > > > > > > 2009-10-07<
> > > > > > >
> > > > >
> > >
> http://hbase2.s3.amazonaws.com:80/hbasedata/-ROOT-/70236052%0A2009-10-07
> > > > > > >13:27:27,751
> > > > > > > INFO org.apache.hadoop.hbase.ipc.HBaseRpcMetrics:
> > > > > > > > Initializing RPC Metrics with hostName=HMaster, port=60000
> > > > > > > > 2009-10-07 13:27:27,827 INFO
> > > > org.apache.hadoop.hbase.master.HMaster:
> > > > > > > > HMaster
> > > > > > > > initialized on 10.244.9.171:60000
> > > > > > > > 2009-10-07 13:27:27,829 INFO
> > > > > org.apache.hadoop.metrics.jvm.JvmMetrics:
> > > > > > > > Initializing JVM Metrics with processName=Master,
> > > sessionId=HMaster
> > > > > > > > 2009-10-07 13:27:27,830 INFO
> > > > > > > > org.apache.hadoop.hbase.master.metrics.MasterMetrics:
> > Initialized
> > > > > > > > 2009-10-07 13:27:27,932 INFO org.mortbay.util.Credential:
> > > Checking
> > > > > > > Resource
> > > > > > > > aliases
> > > > > > > > 2009-10-07 13:27:27,936 INFO org.mortbay.http.HttpServer:
> > Version
> > > > > > > > Jetty/5.1.4
> > > > > > > > 2009-10-07 13:27:27,936 INFO org.mortbay.util.Container:
> > Started
> > > > > > > > HttpContext[/logs,/logs]
> > > > > > > > 2009-10-07 13:27:28,202 INFO org.mortbay.util.Container:
> > Started
> > > > > > > > org.mortbay.jetty.servlet.WebApplicationHandler@3209fa8f
> > > > > > > > 2009-10-07 13:27:28,244 INFO org.mortbay.util.Container:
> > Started
> > > > > > > > WebApplicationContext[/static,/static]
> > > > > > > > 2009-10-07 13:27:28,361 INFO org.mortbay.util.Container:
> > Started
> > > > > > > > org.mortbay.jetty.servlet.WebApplicationHandler@b0c0f66
> > > > > > > > 2009-10-07 13:27:28,364 INFO org.mortbay.util.Container:
> > Started
> > > > > > > > WebApplicationContext[/,/]
> > > > > > > > 2009-10-07 13:27:28,636 INFO org.mortbay.util.Container:
> > Started
> > > > > > > > org.mortbay.jetty.servlet.WebApplicationHandler@3c2d7440
> > > > > > > > 2009-10-07 13:27:28,638 INFO org.mortbay.util.Container:
> > Started
> > > > > > > > WebApplicationContext[/api,rest]
> > > > > > > > 2009-10-07 13:27:28,639 INFO org.mortbay.http.SocketListener:
> > > > Started
> > > > > > > > SocketListener on 0.0.0.0:60010
> > > > > > > > 2009-10-07 13:27:28,639 INFO org.mortbay.util.Container:
> > Started
> > > > > > > > org.mortbay.jetty.Server@28b301f2
> > > > > > > > 2009-10-07 13:27:28,640 INFO
> org.apache.hadoop.ipc.HBaseServer:
> > > IPC
> > > > > > > Server
> > > > > > > > Responder: starting
> > > > > > > > 2009-10-07 13:27:28,641 INFO
> org.apache.hadoop.ipc.HBaseServer:
> > > IPC
> > > > > > > Server
> > > > > > > > listener on 60000: starting
> > > > > > > > 2009-10-07 13:27:28,641 INFO
> org.apache.hadoop.ipc.HBaseServer:
> > > IPC
> > > > > > > Server
> > > > > > > > handler 0 on 60000: starting
> > > > > > > > 2009-10-07 13:27:28,641 INFO
> org.apache.hadoop.ipc.HBaseServer:
> > > IPC
> > > > > > > Server
> > > > > > > > handler 1 on 60000: starting
> > > > > > > > 2009-10-07 13:27:28,641 INFO
> org.apache.hadoop.ipc.HBaseServer:
> > > IPC
> > > > > > > Server
> > > > > > > > handler 2 on 60000: starting
> > > > > > > > 2009-10-07 13:27:28,642 INFO
> org.apache.hadoop.ipc.HBaseServer:
> > > IPC
> > > > > > > Server
> > > > > > > > handler 3 on 60000: starting
> > > > > > > > 2009-10-07 13:27:28,642 INFO
> org.apache.hadoop.ipc.HBaseServer:
> > > IPC
> > > > > > > Server
> > > > > > > > handler 4 on 60000: starting
> > > > > > > > 2009-10-07 13:27:28,642 INFO
> org.apache.hadoop.ipc.HBaseServer:
> > > IPC
> > > > > > > Server
> > > > > > > > handler 5 on 60000: starting
> > > > > > > > 2009-10-07 13:27:28,642 INFO
> org.apache.hadoop.ipc.HBaseServer:
> > > IPC
> > > > > > > Server
> > > > > > > > handler 6 on 60000: starting
> > > > > > > > 2009-10-07 13:27:28,642 INFO
> org.apache.hadoop.ipc.HBaseServer:
> > > IPC
> > > > > > > Server
> > > > > > > > handler 7 on 60000: starting
> > > > > > > > 2009-10-07 13:27:28,642 INFO
> org.apache.hadoop.ipc.HBaseServer:
> > > IPC
> > > > > > > Server
> > > > > > > > handler 8 on 60000: starting
> > > > > > > > 2009-10-07 13:27:28,642 DEBUG
> > > > org.apache.hadoop.hbase.master.HMaster:
> > > > > > > > Started service threads
> > > > > > > > 2009-10-07 13:27:28,643 INFO
> org.apache.hadoop.ipc.HBaseServer:
> > > IPC
> > > > > > > Server
> > > > > > > > handler 9 on 60000: starting
> > > > > > > > 2009-10-07 13:28:09,519 INFO
> > > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > > in safe mode
> > > > > > > > 2009-10-07 13:28:11,542 INFO
> > > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > > in safe mode
> > > > > > > > 2009-10-07 13:28:13,543 INFO
> > > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > > in safe mode
> > > > > > > > 2009-10-07 13:28:15,545 INFO
> > > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > > in safe mode
> > > > > > > > 2009-10-07 13:28:17,548 INFO
> > > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > > in safe mode
> > > > > > > > 2009-10-07 13:28:19,555 INFO
> > > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > > in safe mode
> > > > > > > > 2009-10-07 13:28:27,834 INFO
> > > > > > org.apache.hadoop.hbase.master.BaseScanner:
> > > > > > > > All
> > > > > > > > 0 .META. region(s) scanned
> > > > > > > > 2009-10-07 13:29:27,832 INFO
> > > > > > org.apache.hadoop.hbase.master.BaseScanner:
> > > > > > > > All
> > > > > > > > 0 .META. region(s) scanned
> > > > > > > > 2009-10-07 13:29:37,593 INFO
> > > > > > > org.apache.hadoop.hbase.master.RegionManager:
> > > > > > > > in safe mode
> > > > > > > > 2009-10-07 13:30:27,834 INFO
> > > > > > org.apache.hadoop.hbase.master.BaseScanner:
> > > > > > > > All
> > > > > > > > 0 .META. region(s) scanned
> > > > > > > > 2009-10-07 13:31:27,836 INFO
> > > > > > org.apache.hadoop.hbase.master.BaseScanner:
> > > > > > > > All
> > > > > > > > 0 .META. region(s) scanned
> > > > > > > > 2009-10-07 13:32:27,838 INFO
> > > > > > org.apache.hadoop.hbase.master.BaseScanner:
> > > > > > > > All
> > > > > > > > 0 .META. region(s) scanned
> > > > > > > > 2009-10-07 13:33:27,840 INFO
> > > > > > org.apache.hadoop.hbase.master.BaseScanner:
> > > > > > > > All
> > > > > > > > 0 .META. region(s) scanned
> > > > > > > >
> > > > > > > >
> > > > > > > > Ananth T Sarathy
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Oct 7, 2009 at 1:20 PM, stack <stack@duboce.net>
> > wrote:
> > > > > > > >
> > > > > > > > > Thats interesting to hear.  Keep us posted.
> > > > > > > > >
> > > > > > > > > HBase asks the filesystem if its in safe mode
and if it is,
> > it
> > > > > parks
> > > > > > > > > itself.  Here is code from master:
> > > > > > > > >
> > > > > > > > >    if (this.fs instanceof DistributedFileSystem)
{
> > > > > > > > >      // Make sure dfs is not in safe mode
> > > > > > > > >      String message = "Waiting for dfs to exit
safe
> mode...";
> > > > > > > > >      while (((DistributedFileSystem) fs).setSafeMode(
> > > > > > > > >          FSConstants.SafeModeAction.SAFEMODE_GET))
{
> > > > > > > > >        LOG.info(message);
> > > > > > > > >        try {
> > > > > > > > >          Thread.sleep(this.threadWakeFrequency);
> > > > > > > > >        } catch (InterruptedException e) {
> > > > > > > > >          //continue
> > > > > > > > >        }
> > > > > > > > >      }
> > > > > > > > >    }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Then there is hbase's notion of safemode.  It
will be in
> safe
> > > > mode
> > > > > > > until
> > > > > > > > it
> > > > > > > > > does initial scan of catalog tables.  The master
keeps a
> flag
> > > in
> > > > > > > > zookeeper
> > > > > > > > > while its in safemode so regionservers are aware
of the
> > state:
> > > > > > > > >
> > > > > > > > >  public boolean inSafeMode() {
> > > > > > > > >    if (safeMode) {
> > > > > > > > >      if(isInitialMetaScanComplete() &&
> > > regionsInTransition.size()
> > > > > ==
> > > > > > 0
> > > > > > > &&
> > > > > > > > >         tellZooKeeperOutOfSafeMode()) {
> > > > > > > > >        master.connection.unsetRootRegionLocation();
> > > > > > > > >        safeMode = false;
> > > > > > > > >        LOG.info("exiting safe mode");
> > > > > > > > >      } else {
> > > > > > > > >        LOG.info("in safe mode");
> > > > > > > > >      }
> > > > > > > > >    }
> > > > > > > > >    return safeMode;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > Have you seen the .META. and -ROOT- deploy to
> regionservers?
> > > >  Have
> > > > > > you
> > > > > > > > seen
> > > > > > > > > that these regions being scanned in the master
log?
>  (Enable
> > > > DEBUG
> > > > > if
> > > > > > > not
> > > > > > > > > already enabled).
> > > > > > > > >
> > > > > > > > > Yours,
> > > > > > > > > ST.Ack
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Oct 7, 2009 at 10:06 AM, Ananth T. Sarathy
<
> > > > > > > > > ananth.t.sarathy@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > > We have been running Hbase on a s3 filesystem.
It's the
> > hbase
> > > > > > > > > regionserver,
> > > > > > > > > > not HDFS since we are using s3.  We haven't
felt like
> it's
> > > been
> > > > > too
> > > > > > > > slow,
> > > > > > > > > > though the amount of data we are pushing
isn't
> sufficiently
> > > > large
> > > > > > > > enough
> > > > > > > > > to
> > > > > > > > > > notice yet.
> > > > > > > > > > Ananth T Sarathy
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Oct 7, 2009 at 12:47 PM, stack <stack@duboce.net
> >
> > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > HBase or HDFS is in safe mode.  My
guess is that its
> the
> > > > > latter.
> > > > > > > > Can
> > > > > > > > > > you
> > > > > > > > > > > figure from HDFS logs why it won't
leave safe mode?
> > >  Usually
> > > > > > > > > > > under-replication or a loss of a large
swath of the
> > cluster
> > > > > will
> > > > > > > flip
> > > > > > > > > on
> > > > > > > > > > > the
> > > > > > > > > > > safe-mode switch.
> > > > > > > > > > >
> > > > > > > > > > > Are you trying to run HBASE on an S3
filesystem?  An
> > > HBasista
> > > > > > tried
> > > > > > > > it
> > > > > > > > > in
> > > > > > > > > > > the past and, FYI, found it insufferably
slow.  Let us
> > know
> > > > how
> > > > > > it
> > > > > > > > goes
> > > > > > > > > > for
> > > > > > > > > > > you.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > St.Ack
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Oct 7, 2009 at 9:33 AM, Ananth
T. Sarathy <
> > > > > > > > > > > ananth.t.sarathy@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > my  regionserver has been stuck
in safemode. What can
> i
> > > do
> > > > to
> > > > > > get
> > > > > > > > it
> > > > > > > > > > out
> > > > > > > > > > > > safemode?
> > > > > > > > > > > >
> > > > > > > > > > > > Ananth T Sarathy
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > __________________________________________________
> > > > Do You Yahoo!?
> > > > Tired of spam?  Yahoo! Mail has the best spam protection around
> > > > http://mail.yahoo.com
> > >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message