hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristoffer Sjögren <sto...@gmail.com>
Subject RegionServer aborting and shutting down
Date Thu, 22 Sep 2016 08:17:28 GMT
Hi

We are running OpenTSDB 2.2 with HBase 1.1.2 and are having problems
with RegionServers that are shutting down sporadically from alleged GC
pauses.

We run 2 OpenTSDB machines and 30 region servers. 8 GB heaps. The
region servers are collocated with data nodes and yarn jobs. Every
region server receive around 1000 req/s each.

Even though the logs says it's a GC pause, monitoring doesn't report
the actual pause. The rather suspicious log line says wal.FSHLog: Slow
sync cost: 56257 ms just after the GC pause detector warned and aborts
the region server. CPU, memory, network looks fine.

We have had this problem for a long time and have been troubleshooting
thoroughly, but we are still clueless.

Any advice would be helpful.

Cheers,
-Kristoffer

[1] https://www.dropbox.com/s/m2cuutcdh81itay/hbase.log?dl=0

Mime
View raw message