Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E10F6200C80 for ; Thu, 25 May 2017 20:45:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id DF9B7160BB4; Thu, 25 May 2017 18:45:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 31B87160BD5 for ; Thu, 25 May 2017 20:45:08 +0200 (CEST) Received: (qmail 93714 invoked by uid 500); 25 May 2017 18:45:07 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 93703 invoked by uid 99); 25 May 2017 18:45:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 May 2017 18:45:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C421FC0040 for ; Thu, 25 May 2017 18:45:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id EPSdr01Nt1AK for ; Thu, 25 May 2017 18:45:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 40EF85FCD2 for ; Thu, 25 May 2017 18:45:05 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 798E5E05CE for ; Thu, 25 May 2017 18:45:04 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3A51121B57 for ; Thu, 25 May 2017 18:45:04 +0000 (UTC) Date: Thu, 25 May 2017 18:45:04 +0000 (UTC) From: "Andrew Purtell (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-18116) Replication buffer quota accounting should not include bulk transfer hfiles MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 25 May 2017 18:45:09 -0000 [ https://issues.apache.org/jira/browse/HBASE-18116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025196#comment-16025196 ] Andrew Purtell commented on HBASE-18116: ---------------------------------------- Also, when calculating the heap size of a replication queue entry we only track the WALEdit objects, not the associated WALKey objects. > Replication buffer quota accounting should not include bulk transfer hfiles > --------------------------------------------------------------------------- > > Key: HBASE-18116 > URL: https://issues.apache.org/jira/browse/HBASE-18116 > Project: HBase > Issue Type: Bug > Components: Replication > Reporter: Andrew Purtell > > In ReplicationSourceWALReaderThread we maintain a global quota on enqueued replication work for preventing OOM by queuing up too many edits into queues on heap. When calculating the size of a given replication queue entry, if it has associated hfiles (is a bulk load to be replicated as a batch of hfiles), we get the file sizes and include the sum. We then apply that result to the quota. This isn't quite right. Those hfiles will be pulled by the sink as a file copy, not pushed by the source. The cells in those files are not queued in memory at the source and therefore shouldn't be counted against the quota. > Related, the sum of the hfile sizes are also included when checking if queued work exceeds the configured replication queue capacity, which is by default 64 MB. HFiles are commonly much larger than this. > So what happens is when we encounter a bulk load replication entry typically both the quota and capacity limits are exceeded, we break out of loops, and send right away. What is transferred on the wire via HBase RPC though has only a partial relationship to the calculation. > Depending how you look at it, it makes sense to factor hfile file sizes against replication queue capacity limits. The sink will be occupied transferring those files at the HDFS level. Anyway, this is how we have been doing it and it is too late to change now. I do not however think it is correct to apply hfile file sizes against a quota for in memory state on the source. The source doesn't queue or even transfer those bytes. > Something I noticed while working on HBASE-18027. -- This message was sent by Atlassian JIRA (v6.3.15#6346)