tephra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micael Capitão <micael.capi...@xpand-it.com>
Subject TransactionCodec poor performance
Date Wed, 31 May 2017 08:49:09 GMT
Hi all,

I've been testing Tephra 0.11.0 for a project that may need transactions 
on top of HBase and I find it's performance, for instance, for a bulk 
load, very poor. Let's not discuss why am I doing a bulk load with 
transactions.

In my use case I am generating batches of ~10000 elements and inserting 
them with the *put(List<Put> puts)* method. There is no concurrent 
writers or readers.
If I do the put without transactions it takes ~0.5s. If I use the 
*TransactionAwareHTable* it takes ~12s.
I've tracked down the performance killer to be the 
*addToOperation(OperationWithAttributes op, Transaction tx)*, more 
specifically the *txCodec.encode(tx)*.

I've created a TransactionAwareHTableFix with the *addToOperation(txPut, 
tx)* commented, and used it in my code, and each batch started to take 
~0.5s.

I've noticed that inside the *TransactionCodec* you were instantiating a 
new TSerializer and TDeserializer on each call to encode/decode. I tried 
instantiating the ser/deser on the constructor but even that way each of 
my batches would take the same ~12s.

Further investigation has shown me that the Transaction instance, after 
being encoded by the TransactionCodec, has 104171 bytes of length. So in 
my 10000 elements batch, ~970MB is metadata. Is that supposed to happen?


Regards,

Micael Capitão

Mime
View raw message