cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-6737) A batch statements on a single partition should not create a new CF object for each update
Date Wed, 19 Feb 2014 18:36:35 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sylvain Lebresne updated CASSANDRA-6737:
----------------------------------------

    Attachment: 6737.txt

Attaching patch to make batch statements only create one CF and RowMutation object per partition.
On a relatively simple benchmark inserting a 10k rows batch into a single partition (using
the DataStax java driver, code here: https://gist.github.com/pcmanus/9098347, this isn't meant
to be fancy) I get up to more than 20x improvement with this patch (on batch insertion drop
from >1.2 seconds to ~50-100ms).

Note that there is more optimization that we can be done for single partition batches through
some special casing, but this is a very simple start.


> A batch statements on a single partition should not create a new CF object for each update
> ------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6737
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6737
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.0.6
>
>         Attachments: 6737.txt
>
>
> BatchStatement creates a new ColumnFamily object (as well as a new RowMutation object)
for every update in the batch, even if all those update are actually on the same partition.
This is particularly inefficient when bulkloading data into a single partition (which is not
all that uncommon).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message