Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@locus.apache.org Received: (qmail 5650 invoked from network); 16 Apr 2008 23:30:33 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 Apr 2008 23:30:33 -0000 Received: (qmail 36671 invoked by uid 500); 16 Apr 2008 23:30:31 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 36658 invoked by uid 500); 16 Apr 2008 23:30:31 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 36644 invoked by uid 99); 16 Apr 2008 23:30:31 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2008 16:30:31 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Apr 2008 23:29:47 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 82FEF234C0DF for ; Wed, 16 Apr 2008 16:27:22 -0700 (PDT) Message-ID: <1056920783.1208388442535.JavaMail.jira@brutus> Date: Wed, 16 Apr 2008 16:27:22 -0700 (PDT) From: "Bryan Duxbury (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Updated: (HBASE-24) Scaling: Too many open file handles to datanodes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Duxbury updated HBASE-24: ------------------------------- Fix Version/s: (was: 0.2.0) This is really more of a DFS issue, and there's an admittedly suboptimal solution to be had in increasing the max number of open file handles at the OS level. As such, we're going to hold off on solving this issue until after 0.2. > Scaling: Too many open file handles to datanodes > ------------------------------------------------ > > Key: HBASE-24 > URL: https://issues.apache.org/jira/browse/HBASE-24 > Project: Hadoop HBase > Issue Type: Bug > Components: regionserver > Reporter: stack > Priority: Critical > > We've been here before (HADOOP-2341). > Today the rapleaf gave me an lsof listing from a regionserver. Had thousands of open sockets to datanodes all in ESTABLISHED and CLOSE_WAIT state. On average they seem to have about ten file descriptors/sockets open per region (They have 3 column families IIRC. Per family, can have between 1-5 or so mapfiles open per family -- 3 is max... but compacting we open a new one, etc.). > They have thousands of regions. 400 regions -- ~100G, which is not that much -- takes about 4k open file handles. > If they want a regionserver to server a decent disk worths -- 300-400G -- then thats maybe 1600 regions... 16k file handles. If more than just 3 column families..... then we are in danger of blowing out limits if they are 32k. > We've been here before with HADOOP-2341. > A dfsclient that used non-blocking i/o would help applications like hbase (The datanode doesn't have this problem as bad -- CLOSE_WAIT on regionserver side, the bulk of the open fds in the rapleaf log, don't have a corresponding open resource on datanode end). > Could also just open mapfiles as needed, but that'd kill our random read performance and its bad enough already. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.