hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Al Lias <al.l...@gmx.de>
Subject Re: Hbase stuck after some hours
Date Sat, 10 Apr 2010 19:08:34 GMT
Thanks looking into it, Todd,

Am 09.04.2010 17:16, schrieb Todd Lipcon:
> Hi,
> This is likely a multiple assignment bug.

I tried again, this time I grep'ed for the the region that a client
could not find. Locks like something with "mutliple assigment".


> Can you grep the NN log for the block ID 991235084167234271 ? This should
> tell you which file it was originally allocated to, as well as what IP wrote
> it. You should also see a deletion later. Also, the filename should give you
> a clue as to which region the block is from. You can then consult those
> particular RS and master logs to see which servers deleted the file and why.

PLS help; http://pastebin.com/zUxqyyfU (not sorted by time)
I can only see that the Master adviced to delete....

(This error is a different instance of the same problem than the one above)



> -Todd
> On Fri, Apr 9, 2010 at 12:56 AM, Al Lias <al.lias@gmx.de> wrote:
>> I repeatedly have the following problem with
>> 0.20.3/dfs.datanode.socket.write.timeout=0: Some RS is requested for
>> some data, the DFS can not find it, client hangs until timeout.
>> Grepping the cluster logs, I can see this:
>> 1. at some time the DFS is asked to delete a block, blocks are deleted
>> from the datanodes
>> 2. some minutes later, a RS seems to ask for exactly this block...DFS
>> says "Block blk_.. is not valid." and then "No live nodes contain
>> current block".
>> (I have xceivers and file desc limit high,
>> dfs.datanode.handler.count=10, No particulary high load, 17 Servers with
>> 24G/4Core)
>> More log here: http://pastebin.com/cdqsy8Ae
>> ?
>> Thx, Al

View raw message