tephra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Terence Yim <cht...@gmail.com>
Subject Re: TransactionCodec poor performance
Date Wed, 31 May 2017 09:29:36 GMT
Hi Micael,

Do you know if the invalid tx list inside the Transaction object is large?

Terence

> On May 31, 2017, at 1:49 AM, Micael Capitão <micael.capitao@xpand-it.com> wrote:
> 
> Hi all,
> 
> I've been testing Tephra 0.11.0 for a project that may need transactions on top of HBase
and I find it's performance, for instance, for a bulk load, very poor. Let's not discuss why
am I doing a bulk load with transactions.
> 
> In my use case I am generating batches of ~10000 elements and inserting them with the
*put(List<Put> puts)* method. There is no concurrent writers or readers.
> If I do the put without transactions it takes ~0.5s. If I use the *TransactionAwareHTable*
it takes ~12s.
> I've tracked down the performance killer to be the *addToOperation(OperationWithAttributes
op, Transaction tx)*, more specifically the *txCodec.encode(tx)*.
> 
> I've created a TransactionAwareHTableFix with the *addToOperation(txPut, tx)* commented,
and used it in my code, and each batch started to take ~0.5s.
> 
> I've noticed that inside the *TransactionCodec* you were instantiating a new TSerializer
and TDeserializer on each call to encode/decode. I tried instantiating the ser/deser on the
constructor but even that way each of my batches would take the same ~12s.
> 
> Further investigation has shown me that the Transaction instance, after being encoded
by the TransactionCodec, has 104171 bytes of length. So in my 10000 elements batch, ~970MB
is metadata. Is that supposed to happen?
> 
> 
> Regards,
> 
> Micael Capitão


Mime
View raw message