hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Hbase stuck after some hours
Date Fri, 09 Apr 2010 15:16:12 GMT
Hi,

This is likely a multiple assignment bug.

Can you grep the NN log for the block ID 991235084167234271 ? This should
tell you which file it was originally allocated to, as well as what IP wrote
it. You should also see a deletion later. Also, the filename should give you
a clue as to which region the block is from. You can then consult those
particular RS and master logs to see which servers deleted the file and why.

-Todd

On Fri, Apr 9, 2010 at 12:56 AM, Al Lias <al.lias@gmx.de> wrote:

> I repeatedly have the following problem with
> 0.20.3/dfs.datanode.socket.write.timeout=0: Some RS is requested for
> some data, the DFS can not find it, client hangs until timeout.
>
> Grepping the cluster logs, I can see this:
>
> 1. at some time the DFS is asked to delete a block, blocks are deleted
> from the datanodes
>
> 2. some minutes later, a RS seems to ask for exactly this block...DFS
> says "Block blk_.. is not valid." and then "No live nodes contain
> current block".
>
> (I have xceivers and file desc limit high,
> dfs.datanode.handler.count=10, No particulary high load, 17 Servers with
> 24G/4Core)
>
> More log here: http://pastebin.com/cdqsy8Ae
>
> ?
>
> Thx, Al
>
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message