Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C88B210969 for ; Mon, 17 Jun 2013 11:34:26 +0000 (UTC) Received: (qmail 95924 invoked by uid 500); 17 Jun 2013 11:34:26 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 95639 invoked by uid 500); 17 Jun 2013 11:34:23 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 95609 invoked by uid 99); 17 Jun 2013 11:34:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 17 Jun 2013 11:34:21 +0000 Date: Mon, 17 Jun 2013 11:34:21 +0000 (UTC) From: "Jean-Marc Spaggiari (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-6295) Possible performance improvement in client batch operations: presplit and send in background MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13685466#comment-13685466 ] Jean-Marc Spaggiari commented on HBASE-6295: -------------------------------------------- Results... First, I'm getting a lot of this in the new version: 2013-06-17 02:02:01,660 INFO [hbase-table-pool-6-thread-1] client.AsyncProcess: Attempt #1 failed for 1395 operations on server hbasetest,56046,1371448843669, resubmitting 1395, tableName=TestTable, last exception was: org.apache.hadoop.hbase.exceptions.NotServingRegionException: org.apache.hadoop.hbase.exceptions.NotServingRegionException: TestTable,00000000000000000000057204,1371448911838.a2579d421e3a844ef5cc87d84219defe. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5347) at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5315) at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:1921) at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:3954) at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:3915) at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3271) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:20938) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2122) at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1829) And not any on the trunk version. RandomWriteTests on your version took an average of 69712 seconds RandomWriteTests on trunk took an average of 112591.3 seconds So I can see an improvement, but now need to figure if the data is correct or not... I have the results for the reads too. I will extract them and post them here. I will also run some other tests to see if the data is correct or not... > Possible performance improvement in client batch operations: presplit and send in background > -------------------------------------------------------------------------------------------- > > Key: HBASE-6295 > URL: https://issues.apache.org/jira/browse/HBASE-6295 > Project: HBase > Issue Type: Improvement > Components: Client, Performance > Affects Versions: 0.95.2 > Reporter: Nicolas Liochon > Assignee: Nicolas Liochon > Labels: noob > Fix For: 0.98.0 > > Attachments: 6295.v11.patch, 6295.v12.patch, 6295.v1.patch, 6295.v2.patch, 6295.v3.patch, 6295.v4.patch, 6295.v5.patch, 6295.v6.patch, 6295.v8.patch, 6295.v9.patch > > > today batch algo is: > {noformat} > for Operation o: List{ > add o to todolist > if todolist > maxsize or o last in list > split todolist per location > send split lists to region servers > clear todolist > wait > } > {noformat} > We could: > - create immediately the final object instead of an intermediate array > - split per location immediately > - instead of sending when the list as a whole is full, send it when there is enough data for a single location > It would be: > {noformat} > for Operation o: List{ > get location > add o to todo location.todolist > if (location.todolist > maxLocationSize) > send location.todolist to region server > clear location.todolist > // don't wait, continue the loop > } > send remaining > wait > {noformat} > It's not trivial to write if you add error management: retried list must be shared with the operations added in the todolist. But it's doable. > It's interesting mainly for 'big' writes -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira