Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@locus.apache.org Received: (qmail 94318 invoked from network); 11 Jan 2009 06:14:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Jan 2009 06:14:21 -0000 Received: (qmail 60053 invoked by uid 500); 11 Jan 2009 06:14:20 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 60031 invoked by uid 500); 11 Jan 2009 06:14:20 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 60020 invoked by uid 99); 11 Jan 2009 06:14:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Jan 2009 22:14:20 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Jan 2009 06:14:20 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id A39AD234C4AB for ; Sat, 10 Jan 2009 22:13:59 -0800 (PST) Message-ID: <2032412298.1231654439669.JavaMail.jira@brutus> Date: Sat, 10 Jan 2009 22:13:59 -0800 (PST) From: "Luo Ning (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Updated: (HBASE-24) Scaling: Too many open file handles to datanodes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luo Ning updated HBASE-24: -------------------------- Attachment: MonitoredReader.java > Scaling: Too many open file handles to datanodes > ------------------------------------------------ > > Key: HBASE-24 > URL: https://issues.apache.org/jira/browse/HBASE-24 > Project: Hadoop HBase > Issue Type: Bug > Components: regionserver > Reporter: stack > Priority: Blocker > Fix For: 0.20.0 > > Attachments: HBASE-823.patch, MonitoredReader.java > > > We've been here before (HADOOP-2341). > Today the rapleaf gave me an lsof listing from a regionserver. Had thousands of open sockets to datanodes all in ESTABLISHED and CLOSE_WAIT state. On average they seem to have about ten file descriptors/sockets open per region (They have 3 column families IIRC. Per family, can have between 1-5 or so mapfiles open per family -- 3 is max... but compacting we open a new one, etc.). > They have thousands of regions. 400 regions -- ~100G, which is not that much -- takes about 4k open file handles. > If they want a regionserver to server a decent disk worths -- 300-400G -- then thats maybe 1600 regions... 16k file handles. If more than just 3 column families..... then we are in danger of blowing out limits if they are 32k. > We've been here before with HADOOP-2341. > A dfsclient that used non-blocking i/o would help applications like hbase (The datanode doesn't have this problem as bad -- CLOSE_WAIT on regionserver side, the bulk of the open fds in the rapleaf log, don't have a corresponding open resource on datanode end). > Could also just open mapfiles as needed, but that'd kill our random read performance and its bad enough already. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.