Mailing-List: contact dev-help@tephra.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@tephra.incubator.apache.org
Date: Wed, 7 Jun 2017 09:56:18 +0000 (UTC)
From: =?utf-8?Q?Micael_Capit=C3=A3o_=28JIRA=29?= <jira@apache.org>
To: dev@tephra.incubator.apache.org
Message-ID: <JIRA.13077610.1496743190000.17022.1496829378466@Atlassian.JIRA>
In-Reply-To: <JIRA.13077610.1496743190000@Atlassian.JIRA>
References: <JIRA.13077610.1496743190000@Atlassian.JIRA> <JIRA.13077610.1496743190261@jira-lw-us.apache.org>
Subject: [jira] [Updated] (TEPHRA-232) Transaction metadata sent on each put
 is too big
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Wed, 07 Jun 2017 09:56:24 -0000


     [ https://issues.apache.org/jira/browse/TEPHRA-232?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:all-tabpanel ]

Micael Capit=C3=A3o updated TEPHRA-232:
----------------------------------
    Priority: Minor  (was: Critical)

> Transaction metadata sent on each put is too big
> ------------------------------------------------
>
>                 Key: TEPHRA-232
>                 URL: https://issues.apache.org/jira/browse/TEPHRA-232
>             Project: Tephra
>          Issue Type: Bug
>    Affects Versions: 0.11.0-incubating, 0.12.0-incubating
>         Environment: HBase 1.2.0-cdh5.11
> CentOS 7.3
> 4x machines
> Bandwidth between machines 1Gbps
>            Reporter: Micael Capit=C3=A3o
>            Assignee: Poorna Chandra
>            Priority: Minor
>
> I've been testing Tephra 0.11.0 (and more recently 0.12.0) for a project =
that may need transactions on top of HBase and I find it's performance, for=
 instance, for a bulk load, very poor. Let's not discuss why am I doing a b=
ulk load with transactions.
> In my use case I am generating batches of ~10000 elements and inserting t=
hem with the *put(List<Put> puts)* method. There is no concurrent writers o=
r readers.
> If I do the put without transactions it takes ~0.5s. If I use the *Transa=
ctionAwareHTable* it takes ~12s.
> In both cases the network bandwidth is fully utilised.
> I've tracked down the performance killer to be the *addToOperation(Operat=
ionWithAttributes op, Transaction tx)* on the TransactionAwareHTable.
> I've created a TransactionAwareHTableFix with the *addToOperation(txPut, =
tx)* commented, and used it in my code, and each batch started to take ~0.5=
s.
> Then I checked what was being done inside the *addToOperation* method and=
 verified that the issue has something to do with the serialization of the =
Transaction object. The serialized Transaction object has 104171 bytes of l=
ength. Considering that it happens for each put, basically my batch of ~100=
00 elements has ~970MB of serialized transactions, which explains the 12s v=
s 5s to be processed at the same time that the network is exhausted.
> It seems that the transactions' metadata, despite being sent to HBase, is=
 not stored so the final table size, with or without transactions, is the s=
ame.
> Is this metadata encoding and send behaviour expected? This is making Tep=
hra unusable, at least with only 1Gbps bandwidth.


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)