Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 73896 invoked from network); 28 Jan 2008 20:58:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Jan 2008 20:58:57 -0000 Received: (qmail 67405 invoked by uid 500); 28 Jan 2008 20:58:44 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 67374 invoked by uid 500); 28 Jan 2008 20:58:44 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 67356 invoked by uid 99); 28 Jan 2008 20:58:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jan 2008 12:58:44 -0800 X-ASF-Spam-Status: No, hits=3.2 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [69.25.74.53] (HELO mX1.myoutlookonLine.com) (69.25.74.53) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jan 2008 20:58:29 +0000 Received: from Mx1.myoutlookonLine.com ([10.9.35.232]) by mX1.myoutlookonLine.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 28 Jan 2008 15:58:17 -0500 Received: from [192.168.180.220] ([64.119.129.74]) by Mx1.myoutlookonLine.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959); Mon, 28 Jan 2008 15:58:10 -0500 Subject: Re: What should the open file limit be for hbase From: Marc Harris To: core-user@hadoop.apache.org In-Reply-To: <479E1998.6030102@duboce.net> References: <1201533091.16299.86.camel@mharris1.jumptap.com> <479DFF2D.7050509@duboce.net> <1201539453.16299.99.camel@mharris1.jumptap.com> <1201541478.16299.103.camel@mharris1.jumptap.com> <479E1998.6030102@duboce.net> Content-Type: multipart/alternative; boundary="=-xOaSkxGFKGX0ekkd9Hv+" Date: Mon, 28 Jan 2008 15:58:07 -0500 Message-Id: <1201553887.16299.117.camel@mharris1.jumptap.com> Mime-Version: 1.0 X-Mailer: Evolution 2.8.2 X-OriginalArrivalTime: 28 Jan 2008 20:58:11.0015 (UTC) FILETIME=[7DC4BD70:01C861F0] X-Virus-Checked: Checked by ClamAV on apache.org --=-xOaSkxGFKGX0ekkd9Hv+ Content-Type: text/plain Content-Transfer-Encoding: 7bit Well the datanode log file was full of exceptions like this: 2008-01-28 11:32:44,378 ERROR org.apache.hadoop.dfs.DataNode: 66.135.42.137:50010:DataXceiver: java.io.FileNotFoundException: /lv_main/hadoop/dfs/data/current/subdir11/blk_-13252053410 85798084.meta (Too many open files) at java.io.FileInputStream.open(Native Method) at java.io.FileInputStream.(FileInputStream.java:106) at org.apache.hadoop.dfs.FSDataset.getMetaDataInputStream(FSDataset.java:481) at org.apache.hadoop.dfs.DataNode $BlockSender.(DataNode.java:1298) at org.apache.hadoop.dfs.DataNode $DataXceiver.readBlock(DataNode.java:913) at org.apache.hadoop.dfs.DataNode $DataXceiver.run(DataNode.java:865) at java.lang.Thread.run(Thread.java:595) 2008-01-28 11:32:45,060 ERROR org.apache.hadoop.dfs.DataNode: 66.135.42.137:50010:DataXceiver: java.io.IOException: Too many open files at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:850) at org.apache.hadoop.dfs.FSDataset $FSVolume.createTmpFile(FSDataset.java:329) at org.apache.hadoop.dfs.FSDataset.createTmpFile(FSDataset.java:637) at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:613) at org.apache.hadoop.dfs.DataNode $BlockReceiver.(DataNode.java:1498) at org.apache.hadoop.dfs.DataNode $DataXceiver.writeBlock(DataNode.java:973) at org.apache.hadoop.dfs.DataNode $DataXceiver.run(DataNode.java:868) at java.lang.Thread.run(Thread.java:595) 2008-01-28 11:32:45,466 ERROR org.apache.hadoop.dfs.DataNode: 66.135.42.137:50010:DataXceiver: java.io.FileNotFoundException: /lv_main/hadoop/dfs/data/tmp/blk_-1866322536983816592.meta (Too many open files) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.(FileOutputStream.java:179) at java.io.FileOutputStream.(FileOutputStream.java:131) at org.apache.hadoop.dfs.FSDataset.createBlockWriteStreams(FSDataset.java:569) at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:624) at org.apache.hadoop.dfs.DataNode $BlockReceiver.(DataNode.java:1498) at org.apache.hadoop.dfs.DataNode $DataXceiver.writeBlock(DataNode.java:973) at org.apache.hadoop.dfs.DataNode $DataXceiver.run(DataNode.java:868) at java.lang.Thread.run(Thread.java:595) Stupidly I did not keep a record of the lsof output. All I remember is that most of the open files seemed to be sockets not local files. I can send you log files if you want them, but not the lsof output. - Marc On Mon, 2008-01-28 at 10:06 -0800, stack wrote: > Exceptions are complaints about too many open files? Looks like you > have to up your ulimit given the numbers you are showing below. Odd > though is that its the datanode that is overflowing the limit. Want to > send me the lsof offlist so I can take a gander? > > Thanks, > St.Ack > > > Marc Harris wrote: > > Sorry, I should have said 4 families in one table, obviously, not 4 > > regions in one table. > > - Marc > > > > > > On Mon, 2008-01-28 at 11:57 -0500, Marc Harris wrote: > > > > > >> My schema is very simple: 4 regions in one table. > >> > >> create table pagefetch ( > >> info MAX_VERSIONS=1, > >> data MAX_VERSIONS=1, > >> headers MAX_VERSIONS=1, > >> redirects MAX_VERSIONS=1 > >> ); > >> > >> I am running hadoop in distributed configuration, but with only one data > >> node. > >> I am running hbase with two region servers (one of which is one the same > >> machine as hadoop). > >> I am seeing the exceptions in my datanode log file, by the way, not my > >> regionserver log file > >> "lsof -p REGIONSERVER_PID | wc -l" gave 479 > >> "lsof -p DATANODE_PID | wc -l" gave 10287 > >> > >> - Marc > >> > >> > >> > >> > >> On Mon, 2008-01-28 at 08:13 -0800, stack wrote: > >> > >> > >>> Hey Marc: > >>> > >>> You are still seeing 'too many open files'? Whats your schema look > >>> like. I added to http://wiki.apache.org/hadoop/Hbase/FAQ#5 a rough > >>> formula for counting how many open mapfiles in a running regionserver. > >>> > >>> Currently, your only recourse is upping the ulimit. Addressing this > >>> scaling barrier will be a focus of next hbase release. > >>> > >>> St.Ack > >>> > >>> > >>> > >>> Marc Harris wrote: > >>> > >>>> I have seen that hbase can cause "too many open file" errors. I increase > >>>> my limit to 10240 (10 times the previous limit) but still get errors. > >>>> > >>>> Is there a recommended value that I should set my open files limit to? > >>>> Is there something else I can do to reduce the number of files, perhaps > >>>> with some other trade-off? > >>>> > >>>> Thanks > >>>> - Marc > >>>> > >>>> > >>>> > >>>> > > > > > --=-xOaSkxGFKGX0ekkd9Hv+--