tephra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Neumann (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TEPHRA-232) Transaction metadata sent on each put is too big
Date Fri, 08 Sep 2017 20:52:00 GMT

    [ https://issues.apache.org/jira/browse/TEPHRA-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159316#comment-16159316

Andreas Neumann commented on TEPHRA-232:

Two issues here
1. The transaction is encoded and sent over and over again (addressed in TEPHRA-247 and 248)
2. Only the write pointer is needed for puts, which is much smaller (addressed in TEPHRA-234)

We'll keep this open to track it, but will push fixes individually for the other Jiras. 

> Transaction metadata sent on each put is too big
> ------------------------------------------------
>                 Key: TEPHRA-232
>                 URL: https://issues.apache.org/jira/browse/TEPHRA-232
>             Project: Tephra
>          Issue Type: Bug
>    Affects Versions: 0.11.0-incubating, 0.12.0-incubating
>         Environment: HBase 1.2.0-cdh5.11
> CentOS 7.3
> 4x machines
> Bandwidth between machines 1Gbps
>            Reporter: Micael Capitão
>            Assignee: Poorna Chandra
>            Priority: Minor
> I've been testing Tephra 0.11.0 (and more recently 0.12.0) for a project that may need
transactions on top of HBase and I find it's performance, for instance, for a bulk load, very
poor. Let's not discuss why am I doing a bulk load with transactions.
> In my use case I am generating batches of ~10000 elements and inserting them with the
*put(List<Put> puts)* method. There is no concurrent writers or readers.
> If I do the put without transactions it takes ~0.5s. If I use the *TransactionAwareHTable*
it takes ~12s.
> In both cases the network bandwidth is fully utilised.
> I've tracked down the performance killer to be the *addToOperation(OperationWithAttributes
op, Transaction tx)* on the TransactionAwareHTable.
> I've created a TransactionAwareHTableFix with the *addToOperation(txPut, tx)* commented,
and used it in my code, and each batch started to take ~0.5s.
> Then I checked what was being done inside the *addToOperation* method and verified that
the issue has something to do with the serialization of the Transaction object. The serialized
Transaction object has 104171 bytes of length. Considering that it happens for each put, basically
my batch of ~10000 elements has ~970MB of serialized transactions, which explains the 12s
vs 5s to be processed at the same time that the network is exhausted.
> It seems that the transactions' metadata, despite being sent to HBase, is not stored
so the final table size, with or without transactions, is the same.
> Is this metadata encoding and send behaviour expected? This is making Tephra unusable,
at least with only 1Gbps bandwidth.

This message was sent by Atlassian JIRA

View raw message