Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 38D0C200C81 for ; Fri, 26 May 2017 09:51:12 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 375E1160BB8; Fri, 26 May 2017 07:51:12 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7C6D3160BC8 for ; Fri, 26 May 2017 09:51:11 +0200 (CEST) Received: (qmail 61934 invoked by uid 500); 26 May 2017 07:51:10 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 61742 invoked by uid 99); 26 May 2017 07:51:10 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 May 2017 07:51:10 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D29451AF9BC for ; Fri, 26 May 2017 07:51:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id bTXvlsPzmRXA for ; Fri, 26 May 2017 07:51:08 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id AC22C5FCD2 for ; Fri, 26 May 2017 07:51:07 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7D710E0D50 for ; Fri, 26 May 2017 07:51:06 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2B7D221B5D for ; Fri, 26 May 2017 07:51:05 +0000 (UTC) Date: Fri, 26 May 2017 07:51:05 +0000 (UTC) From: "Ashu Pachauri (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-18027) Replication should respect RPC size limits when batching edits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 26 May 2017 07:51:12 -0000 [ https://issues.apache.org/jira/browse/HBASE-18027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025944#comment-16025944 ] Ashu Pachauri commented on HBASE-18027: --------------------------------------- bq. Instead we try to avoid creating an overlarge RPC by setting the replication queue capacity limit to the lesser of replication.source.size.capacity or 95% of the RPC size limit. I think this does not solve the underlying problem and can still result in stuck replication due to requests being too large. I was also reading through the rest of the discussion here. I think handling rpc size limit inside ReplicationSource does not really make a lot of sense. This is because the endpoint partitions the batch before making the rpc. So, it makes sense for the endpoint to make the rpc size enforcement. Changing the batching strategy in the endpoint actually gives us two benefits: 1. The purpose of this jira, i.e. enforcing rpc size limit. 2. Replication perf is currently plagued by the fact that, in a batch, there is never more than a single thread shipping edits for a region. So, if a region is receiving heavy traffic on the source cluster, replication performs very poorly. As for the implementation, I think you don't need to maintain a lot of state in the endpoint to enforce rpc limit. All that needs to be done is to partition a single batch into multiple batches if it exceeds the rpc limit. I think something like the following would work: {code} private List> createBatches(List entries) { int maxBatchSize = (int)(0.95 * conf.getInt(RpcServer.MAX_REQUEST_SIZE, RpcServer.DEFAULT_MAX_REQUEST_SIZE)); int numSinks = Math.max(replicationSinkMgr.getNumSinks(), 1); int n = Math.min(Math.min(this.maxThreads, entries.size()/100+1), numSinks); // Maintains the current batch for a given partition index Map> entryMap = new HashMap<>(n); List> entryLists = new ArrayList<>(); int[] sizes = new int[n]; for (int i = 0; i < n; i++) { entryMap.put(i, new ArrayList(entries.size()/n+1)); } for (Entry e: entries) { int index = Math.abs(Bytes.hashCode(e.getKey().getEncodedRegionName())%n); int entrySize = estimatedSize(e); // If this batch is oversized, add it to final list and initialize a new empty batch if (sizes[index] + entrySize > maxBatchSize) { entryLists.add(entryMap.get(index)); entryMap.put(index, new ArrayList()); sizes[index] = 0; } entryMap.get(index).add(e); sizes[index] += entrySize; } entryLists.addAll(entryMap.values()); return entryLists; } {code} > Replication should respect RPC size limits when batching edits > -------------------------------------------------------------- > > Key: HBASE-18027 > URL: https://issues.apache.org/jira/browse/HBASE-18027 > Project: HBase > Issue Type: Bug > Components: Replication > Affects Versions: 2.0.0, 1.4.0, 1.3.1 > Reporter: Andrew Purtell > Assignee: Andrew Purtell > Fix For: 2.0.0, 1.4.0, 1.3.2 > > Attachments: HBASE-18027-branch-1.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch, HBASE-18027.patch > > > In HBaseInterClusterReplicationEndpoint#replicate we try to replicate in batches. We create N lists. N is the minimum of configured replicator threads, number of 100-waledit batches, or number of current sinks. Every pending entry in the replication context is then placed in order by hash of encoded region name into one of these N lists. Each of the N lists is then sent all at once in one replication RPC. We do not test if the sum of data in each N list will exceed RPC size limits. This code presumes each individual edit is reasonably small. Not checking for aggregate size while assembling the lists into RPCs is an oversight and can lead to replication failure when that assumption is violated. > We can fix this by generating as many replication RPC calls as we need to drain a list, keeping each RPC under limit, instead of assuming the whole list will fit in one. -- This message was sent by Atlassian JIRA (v6.3.15#6346)