Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 3697 invoked from network); 10 Dec 2010 23:09:22 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Dec 2010 23:09:22 -0000 Received: (qmail 40693 invoked by uid 500); 10 Dec 2010 23:09:22 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 40665 invoked by uid 500); 10 Dec 2010 23:09:22 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 40657 invoked by uid 99); 10 Dec 2010 23:09:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Dec 2010 23:09:22 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Dec 2010 23:09:21 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oBAN91a4017333 for ; Fri, 10 Dec 2010 23:09:01 GMT Message-ID: <10670980.66711292022541025.JavaMail.jira@thor> Date: Fri, 10 Dec 2010 18:09:01 -0500 (EST) From: "stack (JIRA)" To: issues@hbase.apache.org Subject: [jira] Commented: (HBASE-3323) OOME in master splitting logs In-Reply-To: <14893210.32711291881781771.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970333#action_12970333 ] stack commented on HBASE-3323: ------------------------------ I'm part way through a review but have to leave. So far it looks like less moving parts and cleaner overall. Will finish up review tomorrow. We could set the number of splits to 1 and ship the RC with that but at the moment, going by the other issues that need fixing, its looking like next week before new RC and that might be time to test this redo of splits. > OOME in master splitting logs > ----------------------------- > > Key: HBASE-3323 > URL: https://issues.apache.org/jira/browse/HBASE-3323 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Priority: Blocker > Fix For: 0.90.0 > > Attachments: hbase-3323.txt, sizes.png > > > In testing a RS failure under heavy increment workload I ran into an OOME when the master was splitting the logs. > In this test case, I have exactly 136 bytes per log entry in all the logs, and the logs are all around 66-74MB). With a batch size of 3 logs, this means the master is loading about 500K-600K edits per log file. Each edit ends up creating 3 byte[] objects, the references for which are each 8 bytes of RAM, so we have 160 (136+8*3) bytes per edit used by the byte[]. For each edit we also allocate a bunch of other objects: one HLog$Entry, one WALEdit, one ArrayList, one LinkedList$Entry, one HLogKey, and one KeyValue. Overall this works out to 400 bytes of overhead per edit. So, with the default settings on this fairly average workload, the 1.5M log entries takes about 770MB of RAM. Since I had a few log files that were a bit larger (around 90MB) it exceeded 1GB of RAM and I got an OOME. > For one, the 400 bytes per edit overhead is pretty bad, and we could probably be a lot more efficient. For two, we should actually account this rather than simply having a configurable "batch size" in the master. > I think this is a blocker because I'm running with fairly default configs here and just killing one RS made the cluster fall over due to master OOME. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.