hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Friso van Vollenhoven <fvanvollenho...@xebia.com>
Subject Re: substantial performance degradation when using WAL
Date Mon, 20 Dec 2010 07:58:23 GMT
Hi Stack,

The NPE is this:
10/12/18 15:39:07 WARN hdfs.StateChange: DIR* FSDirectory.unprotectedSetTimes: failed to setTimes
/hbase/inrdb_ris_update_rrc00/fe5090c366e326cf2b123502e2d4bcce/data/1350525083587292896 because
source does not exist
10/12/18 15:39:07 WARN hdfs.StateChange: DIR* FSDirectory.unprotectedSetTimes: failed to setTimes
/hbase/inrdb_ris_update_rrc00/fe5090c366e326cf2b123502e2d4bcce/meta/4413022065008239343 because
source does not exist
10/12/18 15:39:07 DEBUG namenode.FSNamesystem: 0: /hbase/.logs/w2r1.inrdb.ripe.net,60020,1292333234919/w2r1.inrdb.ripe.net%3A60020.1292336839737
numblocks : 0 clientHolder DFSClient_131715208 clientMachine
10/12/18 15:39:07 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: failed to remove
because it does not exist
10/12/18 15:39:07 ERROR namenode.NameNode: java.lang.NullPointerException
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1088)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1100)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1003)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:206)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:637)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1039)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:845)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:379)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:343)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:317)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:214)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:394)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1148)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1157)

10/12/18 15:39:07 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at m1r1.inrdb.ripe.net/<http://m1r1.inrdb.ripe.net/>

It looks like the array populated by INodeDirectoryWithQuota#getExistingPathINodes(...) has
a null somewhere which is not expected by the FSDirectory#addChild(...). I marked the entries
for the broken file invalid in the edit log with a hex editor. Then the NN does come back
online. It then reports that only about 88% of all blocks are being reported and stays in
safe mode. Of course I could set the threshold lower and make it work, but I am wondering
if it just stopped persisting edits at some point and the only correct version was in memory
or something.

I am running a secondary NN. Restoring a checkpoint doesn't solve the problem. We store NN
data on a filer that does hourly, daily and weekly snapshots, so I could probably go back
to a working version, but I don't think HBase would work afterwards. We do a lot of updates
on data so splits and compactions are quite common, so I guess an older version of the NN
data will surely point to blocks that no longer exist.

We have a secured storage with all our source data, so re-importing everything is an option
which mostly takes about two weeks of time and, above all, is probably quite bad for Hadoop's
reputation within the organization. My main concern is this happening again.

(Sorry for being a bit off topic on this list, but the hdfs-user and cdh-user didn't come
up with responses on this.)


On 19 dec 2010, at 19:52, Stack wrote:

On Sun, Dec 19, 2010 at 1:23 AM, Friso van Vollenhoven
<fvanvollenhoven@xebia.com<mailto:fvanvollenhoven@xebia.com>> wrote:
Right now, however, I am in the unpleasant situation that my NN won't come up anymore after
a restart (throws NPE), so I need to get that fixed first (without formatting, because I am
not very keen on running the 6 day job again). I did a restart of everything to make sure
that anything that was swapped out before got back to memory, but I guess restarting the NN
could have better been left for another time...

You running secondary namenode?

What kinda NPE you seeing?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message