Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B8919200C86 for ; Wed, 31 May 2017 11:00:25 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B7159160BCB; Wed, 31 May 2017 09:00:25 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3004F160BBA for ; Wed, 31 May 2017 11:00:25 +0200 (CEST) Received: (qmail 31329 invoked by uid 500); 31 May 2017 09:00:24 -0000 Mailing-List: contact dev-help@tephra.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tephra.incubator.apache.org Delivered-To: mailing list dev@tephra.incubator.apache.org Delivered-To: moderator for dev@tephra.incubator.apache.org Received: (qmail 17582 invoked by uid 99); 31 May 2017 08:49:57 -0000 X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.002 X-Spam-Level: X-Spam-Status: No, score=-0.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled X-Virus-Scanned: amavisd-new at mail.xpand-it.com To: dev@tephra.incubator.apache.org From: =?UTF-8?Q?Micael_Capit=c3=a3o?= Subject: TransactionCodec poor performance Message-ID: Date: Wed, 31 May 2017 09:49:09 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: quoted-printable archived-at: Wed, 31 May 2017 09:00:25 -0000 Hi all, I've been testing Tephra 0.11.0 for a project that may need transactions=20 on top of HBase and I find it's performance, for instance, for a bulk=20 load, very poor. Let's not discuss why am I doing a bulk load with=20 transactions. In my use case I am generating batches of ~10000 elements and inserting=20 them with the *put(List puts)* method. There is no concurrent=20 writers or readers. If I do the put without transactions it takes ~0.5s. If I use the=20 *TransactionAwareHTable* it takes ~12s. I've tracked down the performance killer to be the=20 *addToOperation(OperationWithAttributes op, Transaction tx)*, more=20 specifically the *txCodec.encode(tx)*. I've created a TransactionAwareHTableFix with the *addToOperation(txPut,=20 tx)* commented, and used it in my code, and each batch started to take=20 ~0.5s. I've noticed that inside the *TransactionCodec* you were instantiating a=20 new TSerializer and TDeserializer on each call to encode/decode. I tried=20 instantiating the ser/deser on the constructor but even that way each of=20 my batches would take the same ~12s. Further investigation has shown me that the Transaction instance, after=20 being encoded by the TransactionCodec, has 104171 bytes of length. So in=20 my 10000 elements batch, ~970MB is metadata. Is that supposed to happen? Regards, Micael Capit=C3=A3o