Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 412B210E31 for ; Wed, 13 Nov 2013 08:57:42 +0000 (UTC) Received: (qmail 93889 invoked by uid 500); 13 Nov 2013 08:55:54 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 93843 invoked by uid 500); 13 Nov 2013 08:55:47 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 93693 invoked by uid 99); 13 Nov 2013 08:55:31 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Nov 2013 08:55:31 +0000 Date: Wed, 13 Nov 2013 08:55:30 +0000 (UTC) From: "Liang Xie (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-8143) HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-8143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13821076#comment-13821076 ] Liang Xie commented on HBASE-8143: ---------------------------------- FYI. i filed https://issues.apache.org/jira/browse/HDFS-5461 which hope to alleviate this issue as well. any comments are welcome:) > HBase on Hadoop 2 with local short circuit reads (ssr) causes OOM > ------------------------------------------------------------------ > > Key: HBASE-8143 > URL: https://issues.apache.org/jira/browse/HBASE-8143 > Project: HBase > Issue Type: Bug > Components: hadoop2 > Affects Versions: 0.98.0, 0.94.7, 0.95.0 > Reporter: Enis Soztutar > Assignee: stack > Priority: Critical > Fix For: 0.98.0, 0.96.1 > > Attachments: 8143.hbase-default.xml.txt, 8143doc.txt, 8143v2.txt, OpenFileTest.java > > > We've run into an issue with HBase 0.94 on Hadoop2, with SSR turned on that the memory usage of the HBase process grows to 7g, on an -Xmx3g, after some time, this causes OOM for the RSs. > Upon further investigation, I've found out that we end up with 200 regions, each having 3-4 store files open. Under hadoop2 SSR, BlockReaderLocal allocates DirectBuffers, which is unlike HDFS 1 where there is no direct buffer allocation. > It seems that there is no guards against the memory used by local buffers in hdfs 2, and having a large number of open files causes multiple GB of memory to be consumed from the RS process. > This issue is to further investigate what is going on. Whether we can limit the memory usage in HDFS, or HBase, and/or document the setup. > Possible mitigation scenarios are: > - Turn off SSR for Hadoop 2 > - Ensure that there is enough unallocated memory for the RS based on expected # of store files > - Ensure that there is lower number of regions per region server (hence number of open files) > Stack trace: > {code} > org.apache.hadoop.hbase.DroppedSnapshotException: region: IntegrationTestLoadAndVerify,yC^P\xD7\x945\xD4,1363388517630.24655343d8d356ef708732f34cfe8946. > at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1560) > at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1439) > at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1380) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:449) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushOneForGlobalPressure(MemStoreFlusher.java:215) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$500(MemStoreFlusher.java:63) > at org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:237) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:632) > at java.nio.DirectByteBuffer.(DirectByteBuffer.java:97) > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288) > at org.apache.hadoop.hdfs.util.DirectBufferPool.getBuffer(DirectBufferPool.java:70) > at org.apache.hadoop.hdfs.BlockReaderLocal.(BlockReaderLocal.java:315) > at org.apache.hadoop.hdfs.BlockReaderLocal.newBlockReader(BlockReaderLocal.java:208) > at org.apache.hadoop.hdfs.DFSClient.getLocalBlockReader(DFSClient.java:790) > at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:888) > at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:455) > at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:645) > at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:689) > at java.io.DataInputStream.readFully(DataInputStream.java:178) > at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.readFromStream(FixedFileTrailer.java:312) > at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFile.java:543) > at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:589) > at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.(StoreFile.java:1261) > at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:512) > at org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:603) > at org.apache.hadoop.hbase.regionserver.Store.validateStoreFile(Store.java:1568) > at org.apache.hadoop.hbase.regionserver.Store.commitFile(Store.java:845) > at org.apache.hadoop.hbase.regionserver.Store.access$500(Store.java:109) > at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.commit(Store.java:2209) > at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1541) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)