hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Sautins <andy.saut...@returnpath.net>
Subject DFS stability running HBase and dfs.datanode.handler.count...
Date Sat, 09 Apr 2011 23:35:05 GMT

    I ran across an mailing list posting from 1/4/2009 that seemed to indicate increasing
dfs.datanode.handler.count could help improve DFS stability (http://mail-archives.apache.org/mod_mbox/hbase-user/200901.mbox/%3C49605FE0.9040509@duboce.net%3E
).  The posting seems to indicate the wiki was updated, but I don't seen anything in the wiki
about increasing dfs.datanode.handler.count.   I have seen a few other notes that seem to
show examples that have raised dfs.datanode.handler.count including one from an IBM article
) and the Pro Hadoop book, but other than that the only other mention I see is from cloudera
seems luke-warm on increasing dfs.datanode.handler.count (http://www.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/

    Given the post is from 2009 I thought I'd ask if anyone has had any success improving
stability of HBase/DFS when increasing dfs.datanode.handler.count.  The specific error we
are seeing somewhat  frequently ( few hundred times per day ) in the datanode longs is as

2011-04-09 00:12:48,035 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(,
storageID=DS-1501576934-, infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: Block blk_-163126943925471435_28809750 is not valid.

   The above seems to correspond to ClosedChannelExceptions in the hbase regionserver logs
as well as some warnings about long write to hlog ( some in the 50+ seconds ).

    The biggest end-user facing issue we are seeing is that Task Trackers keep getting blacklisted.
 It's quite possible our problem is unrelated to anything HBase, but I thought it was worth
asking given what we've been seeing.

   We are currently running 0.91 on an 18 node cluster with ~3k total regions and each region
server is running with 2G of memory.

   Any insight would be appreciated.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message