hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From divye sheth <divs.sh...@gmail.com>
Subject Re: uneven region distribution
Date Sat, 15 Feb 2014 03:54:26 GMT
The 2417 is the total load on the machine. When the regionserver crashes
the master autobalances the regions.

Also when you run balancer externally, one thing you should note that the
balancer runs on a table in a RS. So if the total regions for a table are
20 then in your case the mean would be 4. Check using the hbase ui if the
any table has regions equal to (average +- 1)

Thanks
D
On Feb 15, 2014 9:13 AM, "Ted Yu" <yuzhihong@gmail.com> wrote:

> Please take a look at http://hbase.apache.org/book.html#hbase_metrics.
>
> You should pay attention to callQueueLength, compactionQueueLength,
> readRequestsCount and writeRequestsCount.
>
> Cheers
>
>
> On Fri, Feb 14, 2014 at 7:13 PM, Rohit Kelkar <rohitkelkar@gmail.com>
> wrote:
>
> > It could have been under load because I am not salting the keys. If I
> were
> > in a position to replicate this issue what metrics should I capture so
> > that I find whether it was under load?
> >
> > - R
> >
> > On Friday, February 14, 2014, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > From region server log - was server5 under heavy load ?
> > >
> > >
> > >    1. 2014-02-14 16:06:05,700 WARN
> org.apache.hadoop.hbase.util.Sleeper:
> > We
> > >    slept 99984ms instead of 3000ms, this is likely due to a long
> garbage
> > >    collecting pause and it's usually bad, see
> > >    http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> > >    2. ...
> > >    3. 2014-02-14 16:06:05,783 FATAL
> > >    org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
> > > server
> > >    server5,60020,1392355987269: Unhandled exception:
> > >    org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected;
> > >    currently processing server5,60020,1392355987269 as dead server
> > >
> > >
> > >
> > > On Fri, Feb 14, 2014 at 5:00 PM, Rohit Kelkar <rohitkelkar@gmail.com
> > <javascript:;>>
> > > wrote:
> > >
> > > > Thanks for your inputs,
> > > > I am sharing the master log - http://pastebin.com/Xi9P6Ykr
> > > > and the region server log of the failed region server -
> > > > http://pastebin.com/1munghDv
> > > >
> > > > - R
> > > >
> > > >
> > > > On Fri, Feb 14, 2014 at 6:24 PM, Ted Yu <yuzhihong@gmail.com
> > <javascript:;>>
> > > wrote:
> > > >
> > > > > Looking at bug fix since 0.94.2, I wonder if you are experiencing
> the
> > > > > following which went into 0.94.10 :
> > > > > HBASE-8432 a table with unbalanced regions will balance
> indefinitely
> > > > >
> > > > > Master log would tell us more.
> > > > >
> > > > >
> > > > > On Fri, Feb 14, 2014 at 4:18 PM, Rohit Kelkar <
> rohitkelkar@gmail.com
> > <javascript:;>
> > > >
> > > > > wrote:
> > > > >
> > > > > > Sorry mis-stated the version, its 0.94.2
> > > > > >
> > > > > > - R
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 14, 2014 at 5:59 PM, Ted Yu <yuzhihong@gmail.com
> > <javascript:;>>
> > > wrote:
> > > > > >
> > > > > > > bq.  it does not change the status of the assignments.
> > > > > > >
> > > > > > > Can you check / pastebin master log to see what caused
the
> > > balancing
> > > > to
> > > > > > > stop ?
> > > > > > >
> > > > > > > bq. attributing the region server crash to the
> disproportionately
> > > > high
> > > > > > > number of regions on that server?
> > > > > > >
> > > > > > > Checking region server log on server5 should give us more
clue.
> > > > > > >
> > > > > > > bq. 0.92.4
> > > > > > >
> > > > > > > please consider upgrading :-)
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Feb 14, 2014 at 3:52 PM, Rohit Kelkar <
> > > rohitkelkar@gmail.com <javascript:;>
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I am using hbase version 0.92.4 on a 5 node cluster.
I am
> > seeing
> > > > > that a
> > > > > > > > particular region server often crashes. A status 'simple'
on
> > > hbase
> > > > > > shell
> > > > > > > > gives the following stats
> > > > > > > >
> > > > > > > >
> > > > > > > > HBase Shell; enter 'help<RETURN>' for list of
supported
> > commands.
> > > > > Type
> > > > > > > > "exit<RETURN>" to leave the HBase Shell Version
0.94.2,
> > r1395367,
> > > > Sun
> > > > > > > Oct 7
> > > > > > > > 19:11:01 UTC 2012
> > > > > > > > status 'simple' 4 live servers
> > > > > > > > server7:60020 1392017875910 requestsPerSecond=0,
> > > > > > > numberOfOnlineRegions=419,
> > > > > > > > usedHeapMB=3315, maxHeapMB=6127
> > > > > > > > server4:60020 1392300859332 requestsPerSecond=843,
> > > > > > > > numberOfOnlineRegions=379, usedHeapMB=2070, maxHeapMB=6127
> > > > > > > > server3:60020 1391583646998 requestsPerSecond=429,
> > > > > > > > numberOfOnlineRegions=653, usedHeapMB=3198, maxHeapMB=6127
> > > > > > > > server6:60020 1391583647588 requestsPerSecond=0,
> > > > > > > numberOfOnlineRegions=966,
> > > > > > > > usedHeapMB=2975, maxHeapMB=6127 1 dead servers
> > > > > > > > server5,60020,1392108515637 Aggregate load: 1272,
regions:
> 2417
> > > > > > > >
> > > > > > > > The dead region server has 2417 regions as opposed
to 419,
> 379,
> > > > 653,
> > > > > > 966
> > > > > > > > regions on other servers. Am I right in attributing
the
> region
> > > > server
> > > > > > > crash
> > > > > > > > to the disproportionately high number of regions on
that
> > server?
> > > > > > > >
> > > > > > > > If I invoke the balancer on hbase shell using the
"balancer"
> > > > command
> > > > > it
> > > > > > > > returns true. But it does not change the status of
the
> > > assignments.
> > > > > > > >
> > > > > > > > - R
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message