Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BB4E2CF7F for ; Mon, 24 Jun 2013 19:06:22 +0000 (UTC) Received: (qmail 95387 invoked by uid 500); 24 Jun 2013 19:06:22 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 95354 invoked by uid 500); 24 Jun 2013 19:06:22 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 95345 invoked by uid 99); 24 Jun 2013 19:06:22 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Jun 2013 19:06:22 +0000 Date: Mon, 24 Jun 2013 19:06:22 +0000 (UTC) From: "Nicolas Liochon (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-6295: ----------------------------------- Resolution: Fixed Fix Version/s: 0.95.2 Release Note: The puts are now streamed, i.e. sent asynchronously to the region servers if autoflush it set to false. If a region server is slow or does not respond, its puts are kept into the write buffer while the others are sent to these respective region server, until the write buffer is full. This feature is keeps the semantic of the interface already existing in 0.94 when using autoflush. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) > Possible performance improvement in client batch operations: presplit and send in background > -------------------------------------------------------------------------------------------- > > Key: HBASE-6295 > URL: https://issues.apache.org/jira/browse/HBASE-6295 > Project: HBase > Issue Type: Improvement > Components: Client, Performance > Affects Versions: 0.95.2 > Reporter: Nicolas Liochon > Assignee: Nicolas Liochon > Labels: noob > Fix For: 0.98.0, 0.95.2 > > Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v14.patch, 6295.v15.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch > > > today batch algo is: > {noformat} > for Operation o: List{ > add o to todolist > if todolist > maxsize or o last in list > split todolist per location > send split lists to region servers > clear todolist > wait > } > {noformat} > We could: > - create immediately the final object instead of an intermediate array > - split per location immediately > - instead of sending when the list as a whole is full, send it when there is enough data for a single location > It would be: > {noformat} > for Operation o: List{ > get location > add o to todo location.todolist > if (location.todolist > maxLocationSize) > send location.todolist to region server > clear location.todolist > // don't wait, continue the loop > } > send remaining > wait > {noformat} > It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable. > It's interesting mainly for 'big' writes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira