hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Brodsky <danbrod...@gmail.com>
Subject Re: Regionservers not connecting to master
Date Fri, 02 Nov 2012 18:28:44 GMT
Nope. I'm honestly not sure how the files changed, but I will keep an eye
on it.


On Fri, Nov 2, 2012 at 2:22 PM, Kevin O'dell <kevin.odell@cloudera.com>wrote:

> Do you use Puppet?
>
> On Fri, Nov 2, 2012 at 1:13 PM, Dan Brodsky <danbrodsky@gmail.com> wrote:
>
> > Ram,
> >
> > I wanted to follow up with you since you helped me with your below
> comment.
> >
> > It turns out that the ZK configuration files somehow got changed
> (reverted
> > to their default values?), and I'm not sure who/when/how. The zoo.cfg
> files
> > didn't have the list of quorum peers, and the myid files that told each
> ZK
> > peer their ordinal value had been deleted. So, effectively, I had three
> ZK
> > standalone servers, instead of one quorum.
> >
> > Problem fixed, Hbase is happy again.
> >
> > Cheers,
> >
> > Dan
> >
> >
> >
> > On Wed, Oct 17, 2012 at 9:12 AM, Ramkrishna.S.Vasudevan <
> > ramkrishna.vasudevan@huawei.com> wrote:
> >
> > > Can you try like start any of the regionservers that are not connecting
> > at
> > > all.  May be start 2 of them.
> > > Observer master logs.  See whether it says
> > > 'Waiting for RegionServers to checkin'?.
> > >
> > > Just to confirm your ZK ip and port is correct thro out the cluster? If
> > > multitenant cluster then you may be the other regionservers are
> > connecting
> > > to someother ZK cluster?
> > > Wild guess :)
> > >
> > > Regards
> > > Ram
> > > > -----Original Message-----
> > > > From: Dan Brodsky [mailto:danbrodsky@gmail.com]
> > > > Sent: Wednesday, October 17, 2012 6:31 PM
> > > > To: user@hbase.apache.org
> > > > Subject: Regionservers not connecting to master
> > > >
> > > > Good morning,
> > > >
> > > > I have a 10 node Hadoop/Hbase cluster, plus a namenode VM, plus three
> > > > Zookeeper quorum peers (one on the namenode, one on a dedicated ZK
> > > > peer VM, and one on a third box). All 10 HDFS datanodes are also
> Hbase
> > > > regionservers.
> > > >
> > > > Several weeks ago, we had six HDFS datanodes go offline suddenly
> (with
> > > > no meaningful error messages), and since then, I have been unable to
> > > > get all 10 regionservers to connect to the Hbase master. I've tried
> > > > bringing the cluster down and rebooting all the boxes, but no joy.
> The
> > > > machines are all running, and hbase-regionserver appears to start
> > > > normally on each one.
> > > >
> > > > Right now, my master status page (http://namenode:60010) shows 3
> > > > regionservers online. There are also dozens of regions in transition
> > > > listed on the status page (in the PENDING_OPEN state), but each of
> > > > those are on one of the regionservers already online.
> > > >
> > > > The 7 other regionservers' log files show a successful connection to
> > > > one ZK peer, followed by a regular trail of these messages:
> > > >
> > > > 2012-10-17 12:36:08,394 DEBUG
> > > > org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=8.17
> > > > MB, free=987.67 MB, max=995.84 MB, blocks=0, accesses=0, hits=0,
> > > > hitRatio=0cachingAccesses=0, cachingHits=0,
> > > > cachingHitsRatio=0evictions=0, evicted=0, evictedPerRun=NaN
> > > >
> > > > If I had to wager a guess, it seems like the 7 offline regionservers
> > > > are not connecting to other ZK peers, but there isn't anything in the
> > > > ZK logs to indicate why.
> > > >
> > > > Thoughts?
> > > >
> > > > Dan
> > >
> > >
> >
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message