hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Biju Nair (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-10643) Failure in RS when using large size bucketcache
Date Thu, 06 Mar 2014 14:08:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-10643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13922541#comment-13922541
] 

Biju Nair commented on HBASE-10643:
-----------------------------------

1) Did not encounter this issue when testing recently with bucket cache using the direct buffer
ioengine and MaxDirectMemorySize of 24g and 32g?
  - Yes
2)  Can you share your JVM version particulars?
   -java -version
    java version "1.7.0_45"
    OpenJDK Runtime Environment (rhel-2.4.3.3.el6-x86_64 u45-b15)
    OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)
3) command line options you might have put in hbase-env.sh
  - This is from 24 GB test - hbase-env.sh
  - export HBASE_REGIONSERVER_OPTS="-Xmn512m -XX:CMSInitiatingOccupancyFraction=70  -Xms5000m
-Xmx5000m -XX:MaxDirectMemorySize=22000m"
4) any of the HBase site file settings pertaining to zookeeper?
  - No
5) Running bucketCache in file mode doesn't have this issue.

> Failure in RS when using large size bucketcache
> -----------------------------------------------
>
>                 Key: HBASE-10643
>                 URL: https://issues.apache.org/jira/browse/HBASE-10643
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.98.0, 0.96.0
>            Reporter: Biju Nair
>              Labels: bucketCache, regionserver
>
> When RS is brought up with XX:MaxDirectMemorySize of 22GB or higher, RS fails after a
successful start. From the RS logs it looks like the bucketCache memory allocation is taking
more time makes the RS considered dead by ZK. One option to fix the problem would be to allocate
the bucketCache before registering with ZK. 
> 2014-02-28 18:54:42,967 WARN  [regionserver60020.compactionChecker] util.Sleeper: We
slept 33496ms instead of 10000ms, this is likely due to a long garbage collecting pause and
it's usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2014-02-28 18:54:42,967 WARN  [regionserver60020.periodicFlusher] util.Sleeper: We slept
33496ms instead of 10000ms, this is likely due to a long garbage collecting pause and it's
usually bad, see http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2014-02-28 18:54:42,967 WARN  [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause
in JVM or host machine (eg GC): pause of approximately 23988ms
> GC pool 'ParNew' had collection(s): count=1 time=24432ms
> 2014-02-28 18:54:43,006 FATAL [regionserver60020] regionserver.HRegionServer: ABORTING
region server bbg-master2.bbg-test.hdp,60020,1393628951236: org.apache.hadoop.hbase.YouAreDeadException:
Server REPORT rejected; currently processing bbg-master2.bbg-test.hdp,60020,1393628951236
as dead server
>         at org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:341)
>         at org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:254)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message