Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@locus.apache.org Received: (qmail 70108 invoked from network); 23 Jul 2008 00:12:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Jul 2008 00:12:29 -0000 Received: (qmail 27805 invoked by uid 500); 23 Jul 2008 00:12:29 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 27785 invoked by uid 500); 23 Jul 2008 00:12:29 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 27774 invoked by uid 99); 23 Jul 2008 00:12:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Jul 2008 17:12:28 -0700 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of eleith@diffbot.com does not designate 76.96.62.64 as permitted sender) Received: from [76.96.62.64] (HELO QMTA07.westchester.pa.mail.comcast.net) (76.96.62.64) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jul 2008 00:11:32 +0000 Received: from OMTA07.westchester.pa.mail.comcast.net ([76.96.62.59]) by QMTA07.westchester.pa.mail.comcast.net with comcast id tBuj1Z02G1GhbT857CBx0c; Wed, 23 Jul 2008 00:11:57 +0000 Received: from eleith.com ([71.202.237.105]) by OMTA07.westchester.pa.mail.comcast.net with comcast id tCBw1Z00C2H7kJF3TCBwDZ; Wed, 23 Jul 2008 00:11:57 +0000 X-Authority-Analysis: v=1.0 c=1 a=KwfV_qw6WyYA:10 a=vV3a0GF1ME8A:10 a=mV9VRH-2AAAA:8 a=4MsZ2-xNz3afc6wKSXAA:9 a=CkVokg_vXUsIVe6Jb8YA:9 a=hB9SUpuBic4DdPXBxikA:7 a=Z9IstuSfXDPj2uEnBGnG2rmelEEA:4 a=PxHhwJ9AYXMA:10 Received: from [192.168.1.2] (eleith.com [71.202.237.105]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by eleith.com (Postfix) with ESMTPSA id F0916724339 for ; Tue, 22 Jul 2008 17:11:55 -0700 (PDT) Message-ID: <48867760.6040807@diffbot.com> Date: Tue, 22 Jul 2008 17:12:16 -0700 From: leith User-Agent: Thunderbird 2.0.0.14 (X11/20080421) MIME-Version: 1.0 To: hbase-user@hadoop.apache.org Subject: Re: compaction thread always hitting exceptions References: <48863815.70803@diffbot.com> <4886476A.4030506@duboce.net> In-Reply-To: <4886476A.4030506@duboce.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org stack wrote: > Thanks for writing this note. I'd seen the IOE and presumed it a known > issue in hadoop 0.17. I then went looking for it but couldn't find > one. Turns out its a silly issue in hbase (HBASE-761). > > How many regions you have loaded up in your regionserver? > > You are using default heap size? You might try upping it a little. > Also make sure you have upped open file descriptors limit too (See #4 > and #6 in the FAQ, http://wiki.apache.org/hadoop/Hbase/FAQ). > thanks St.Ack, we've reran our tests, uncommented HEAP_SIZE=1000 in our hbase-env, and have updated trunk to the latest. we haven't run into OOME so far, and no sight of all those compaction errors anymore (thanks!). we have 5 regions loaded up in our regionserver our test hammers hbase as we import a bunch of files into our hbase tables. now, what we see is that hbase seems to be blocking such that nothing gets imported. everything is up and running (hdfs namenode/datanodes, hbase master/regionserver) below, are the logs we see from our hbase region server and from the hdfs datanode ---------------------------------- hbase regionserver --------------------------------- 2008-07-22 16:37:10,419 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 396381, skipped 1, 315153 2008-07-22 16:37:10,672 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 258344, skipped 1, 204628 2008-07-22 16:37:10,856 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 342049, skipped 1, 271105 2008-07-22 16:37:11,080 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 237573, skipped 1, 188148 2008-07-22 16:37:11,280 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 78030028, skipped 1, 39326312 2008-07-22 16:37:33,464 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 366849, skipped 1, 291596 2008-07-22 16:37:33,928 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 984138, skipped 2, 897309 2008-07-22 16:37:34,201 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 67294, skipped 1, 51834 2008-07-22 16:37:34,643 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 45569, skipped 2, 41020 2008-07-22 16:37:35,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region dmls,,1216768730386 in 25sec 2008-07-22 16:38:00,151 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region dmls,,1216768730386 2008-07-22 16:38:00,157 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 413508, skipped 2, 396083 2008-07-22 16:38:00,502 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 269288, skipped 2, 258024 2008-07-22 16:38:00,789 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 356584, skipped 2, 341743 2008-07-22 16:38:01,032 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 247637, skipped 2, 237273 2008-07-22 16:38:01,337 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 83129653, skipped 0, 0 2008-07-22 16:38:42,495 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 382791, skipped 2, 366533 2008-07-22 16:38:42,724 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 1024454, skipped 2, 897309 2008-07-22 16:38:42,959 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 70464, skipped 2, 66950 2008-07-22 16:38:43,377 INFO org.apache.hadoop.hbase.regionserver.HStore: Compaction size 47891, skipped 2, 41020 2008-07-22 16:38:43,716 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region dmls,,1216768730386 in 43sec 2008-07-22 16:38:53,954 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:40:03,931 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:41:13,943 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:42:23,955 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 0 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:43:33,968 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:44:34,095 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 5 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:45:44,109 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 3 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:46:54,122 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 8 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:48:04,135 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size 2008-07-22 16:48:18,116 INFO org.apache.hadoop.hbase.regionserver.LogRoller: Rolling hlog. Number of entries: 26294 2008-07-22 16:48:18,133 INFO org.apache.hadoop.hbase.regionserver.HLog: New log writer created at /hbase/log_64.62.244.2_1216768695534_60020/hlog.dat.1216770498127 2008-07-22 16:49:14,165 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region dmls,,1216768730386: Memcache size 64.0m is >= than blocking 64.0m size --------------------------------------- hdfs datanode -------------------------------------- 2008-07-22 17:10:41,828 WARN org.apache.hadoop.dfs.DataNode: 64.62.244.2:50010:Got exception while serving blk_-355911506373371046 to /64.62.244.2: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774) at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813) at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968) at java.lang.Thread.run(Thread.java:619) 2008-07-22 17:10:41,828 ERROR org.apache.hadoop.dfs.DataNode: 64.62.244.2:50010:DataXceiver: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:53) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:140) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144) at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774) at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813) at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039) at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968) at java.lang.Thread.run(Thread.java:619) 2008-07-22 17:10:41,834 INFO org.apache.hadoop.dfs.DataNode: 64.62.244.2:50010 Served block blk_8154627018748859939 to /64.62.244.2 2008-07-22 17:10:41,843 INFO org.apache.hadoop.dfs.DataNode: 64.62.244.2:50010 Served block blk_6062812905615890844 to /64.62.244.2 2008-07-22 17:10:41,987 INFO org.apache.hadoop.dfs.DataNode: 64.62.244.2:50010 Served block blk_-3883168343557890264 to /64.62.244.2 > St.Ack > > leith wrote: >> looking at our region logs, we've noticed that the compaction thread >> constantly runs into exceptions. the entire log is filled with >> something like this: >> >> ---------------------------------- >> 2008-07-22 12:29:52,759 WARN >> org.apache.hadoop.hbase.regionserver.HStore: Exception closing reader >> for 242866774/new >> java.io.IOException: Stream closed >> at >> org.apache.hadoop.dfs.DFSClient$DFSInputStream.close(DFSClient.java:1319) >> at java.io.FilterInputStream.close(FilterInputStream.java:155) >> at >> org.apache.hadoop.io.SequenceFile$Reader.close(SequenceFile.java:1581) >> at org.apache.hadoop.io.MapFile$Reader.close(MapFile.java:577) >> at >> org.apache.hadoop.hbase.regionserver.HStore.closeCompactionReaders(HStore.java:917) >> >> at >> org.apache.hadoop.hbase.regionserver.HStore.compactHStoreFiles(HStore.java:910) >> >> at >> org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:787) >> at >> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:887) >> >> at >> org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:847) >> >> at >> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:84) >> >> ------------------------------------- >> >> the regionserver is taking about up a good amount of memory on our >> system, and nothing is happening except for i assume compaction/split >> processes. >> >> these only seem to be warnings, but there is so many of them, it would >> be nice to get a second opinion on this. >> >> we've also gotten an 'out of memory' exception a few times from the >> compaction thread, and those actually ended up killing the thread, >> resulting in the region server shutting itself down. >> >> thanks, >> >> /leith >> > >