hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Hbase stuck after some hours
Date Fri, 09 Apr 2010 15:16:12 GMT

This is likely a multiple assignment bug.

Can you grep the NN log for the block ID 991235084167234271 ? This should
tell you which file it was originally allocated to, as well as what IP wrote
it. You should also see a deletion later. Also, the filename should give you
a clue as to which region the block is from. You can then consult those
particular RS and master logs to see which servers deleted the file and why.


On Fri, Apr 9, 2010 at 12:56 AM, Al Lias <al.lias@gmx.de> wrote:

> I repeatedly have the following problem with
> 0.20.3/dfs.datanode.socket.write.timeout=0: Some RS is requested for
> some data, the DFS can not find it, client hangs until timeout.
> Grepping the cluster logs, I can see this:
> 1. at some time the DFS is asked to delete a block, blocks are deleted
> from the datanodes
> 2. some minutes later, a RS seems to ask for exactly this block...DFS
> says "Block blk_.. is not valid." and then "No live nodes contain
> current block".
> (I have xceivers and file desc limit high,
> dfs.datanode.handler.count=10, No particulary high load, 17 Servers with
> 24G/4Core)
> More log here: http://pastebin.com/cdqsy8Ae
> ?
> Thx, Al

Todd Lipcon
Software Engineer, Cloudera

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message