hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trevor Antczak <tantc...@operasolutions.com>
Subject Daemon dying immediately
Date Fri, 02 Aug 2013 19:21:05 GMT
Hi all,

I've had a Hadoop system with hbase working for quite a long time now.  We've got hadoop-hbase-master-0.90.6+84.73-1
installed on Red Hat 5, with four regionservers on slave nodes, and the rest and thrift server
running on the master.  Just today, and pretty much without warning the master crashed.  Now
we can't restart it.  It starts, and then almost immediately dies.  No error message is appearing
in the log, though it's cleaning itself up normally.  The log contains only:

2013-08-02T14:34:40.142-0400: [GC [ParNew: 17024K->1334K(19136K), 0.0052490 secs] 17024K->1334K(83008K),
0.0053100 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
2013-08-02T14:34:40.347-0400: [GC [1 CMS-initial-mark: 0K(63872K)] 9036K(83008K), 0.0071700
secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
2013-08-02T14:34:40.471-0400: [GC [ParNew: 18358K->1234K(19136K), 0.0265690 secs] 18358K->2644K(83008K),
0.0266550 secs] [Times: user=0.12 sys=0.00, real=0.03 secs]
2013-08-02T14:34:40.630-0400: [CMS-concurrent-mark: 0.013/0.275 secs] [Times: user=0.53 sys=0.01,
real=0.27 secs]
2013-08-02T14:34:40.645-0400: [CMS-concurrent-preclean: 0.014/0.015 secs] [Times: user=0.01
sys=0.00, real=0.02 secs]
2013-08-02T14:34:40.645-0400: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times:
user=0.00 sys=0.00, real=0.00 secs]
2013-08-02T14:34:40.645-0400: [GC[YG occupancy: 7584 K (19136 K)][Rescan (parallel) , 0.0030240
secs][weak refs processing, 0.0000090 secs] [1 CMS-remark: 1410K(63872K)] 8994K(83008K), 0.0031230
secs] [Times: user=0.02 sys=0.00, real=0.00 secs]
2013-08-02T14:34:40.649-0400: [CMS-concurrent-sweep: 0.000/0.000 secs] [Times: user=0.00 sys=0.00,
real=0.00 secs]
2013-08-02T14:34:40.726-0400: [CMS-concurrent-reset: 0.077/0.077 secs] [Times: user=0.02 sys=0.05,
real=0.08 secs]
par new generation   total 19136K, used 7584K [0x00002b7281fe0000, 0x00002b72834a0000, 0x00002b72957e0000)
  eden space 17024K,  37% used [0x00002b7281fe0000, 0x00002b7282613928, 0x00002b7283080000)
  from space 2112K,  58% used [0x00002b7283080000, 0x00002b72831b4838, 0x00002b7283290000)
  to   space 2112K,   0% used [0x00002b7283290000, 0x00002b7283290000, 0x00002b72834a0000)
concurrent mark-sweep generation total 63872K, used 1410K [0x00002b72957e0000, 0x00002b7299640000,
concurrent-mark-sweep perm gen total 26256K, used 15758K [0x00002b7475fe0000, 0x00002b7477984000,

And if I restart I get essentially the exact same log overwriting this one (with new timestamps
of course). The rest, thrift, and all the regionservers appear fine.  There's no issues with
disk space or resources on the server box and HDFS appears fine.  Any advice of other places
I can look for more data or how I might get more granularity in the logs?  Or does someone
see an error I'm missing in what already being logged?

Thanks in advance,
Trevor Antczak

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message