hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: Data lost during intensive writes
Date Thu, 26 Mar 2009 17:13:04 GMT

Hi Schubert,

I set dfs.datanode.max.xcievers=4096 in my config. This was the
only way I was able to bring > 7000 regions online on 25 nodes
during cluster restart without DFS errors. Definitely the
default is too low for HBase. HFile in 0.20 will have material
impact here, which should help the situation. Also perhaps more
can/will be done with regards to HBASE-24 to relieve the load on
the DataNodes:

    https://issues.apache.org/jira/browse/HBASE-24?focusedCommentId=12613104&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12613104

The root cause of this is HADOOP-3846: https://issues.apache.org/jira/browse/HADOOP-3856

I looked at helping out on this issue. There is so much 
reimplementation of such a fundamental component (to Hadoop)
involved that it's difficult for a part-time volunteer to make
progress on it. Even if the code can be changed, there is 
follow up shepherding through Core review and release processes
to consider... I hold out hope that a commercial user of Hadoop
will have pain in this area and commit sponsored resources to
address the issue of I/O scalability in DFS. I think when DFS
was written the expectation was that 10,000 nodes would have 
only a few open files each -- very large mapreduce inputs,
intermediates, and outputs -- not that 100s of nodes might
have 1,000s of files open each. In any case, the issue is well
known. 

I have found "dfs.datanode.socket.write.timeout=0" is not
necessary for HBase 0.19.1 on Hadoop 0.19.1 in my testing. 

Best regards,

   -Andy


> From: schubert zhang <zsongbo@gmail.com>
> Subject: Re: Data lost during intensive writes
> To: hbase-user@hadoop.apache.org, apurtell@apache.org
> Date: Thursday, March 26, 2009, 4:58 AM
>
> I will set "dfs.datanode.max.xcievers=1024" (default is 256)
> 
> I am using branch-0.19.
> Do you think "dfs.datanode.socket.write.timeout=0" is
> necessary in release-0.19?
> 
> Schubert



      

Mime
View raw message