cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cliff Moon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-1891) large supercolumn deserialization invokes CSLM worst case scenario
Date Thu, 23 Dec 2010 19:05:45 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12974698#action_12974698
] 

Cliff Moon commented on CASSANDRA-1891:
---------------------------------------

You are correct that CSLM does not override putAll, so there is no benefit to be had with
that approach.  That's why this patch uses the CSLM constructor which takes a SortedMap. 
Internally this constructor invokes buildFromSorted which iterates over the sorted map and
builds the internal structures of the CSLM without iterating.  In my use case I have rather
large supercolumns that contain on the order of 100,000 subcolumns.  With the patch I find
performance benefits ranging from 10 ~ 15% throughput increase when deserializing the supercolumns
from disk.

Also it's unclear to me what benefit is to be had from using a TreeMap.  It's more efficient
to just deserialize directly into the CSLM, which is what ColumnSortedMap enables.  It isn't
a full implementation of SortedMap, just enough to enable the correct behavior in CSLM.

As for the test failures, I've never been able to get cassandra's unit tests to work locally,
so I always had assumed they were simply ornamental.

> large supercolumn deserialization invokes CSLM worst case scenario
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-1891
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1891
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Cliff Moon
>            Priority: Minor
>             Fix For: 0.7.1
>
>         Attachments: supercolumn.patch
>
>
> SuperColumn deserialization hits a worst case insert scenario for CSLM: inserting pre-sorted
entries one at a time.  Inside of CSLM this requires scanning to the end of the list and doing
a comparison at every step for every item inserted.  This patch supplies a SortedMap interface
to the supercolumn deserialization.  CSLM will do a bulk insert from a SortedMap interface
supplied in the constructor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message