cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3141) SSTableSimpleUnsortedWriter call to ColumnFamily.serializedSize iterate through the whole columns
Date Wed, 07 Sep 2011 14:41:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098993#comment-13098993
] 

Sylvain Lebresne commented on CASSANDRA-3141:
---------------------------------------------

If we want to be precise, this doesn't work correctly. In the sense that if you add a column
and there is already an existing column with the same name, this won't compute the serialized
size correctly.

Now we could say that this doesn't matter much in the sense that
  # If you use SSTSUW in cases where you update the same column a lot, you're probably doing
it wrong.
  # Even when that happens, the consequence is that you will 'flush to disk' more often than
you would otherwise. Which ain't necessarily a big deal.
  # It is an estimation anyway

That being said, I wonder if this call to serializedSize() is really that costly. Maybe it
adds a little bit of cost when you 'reopen' a row multiple times, but you are not supposed
to do that too much really (if ever).

> SSTableSimpleUnsortedWriter call to ColumnFamily.serializedSize iterate through the whole
columns
> -------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3141
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3141
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.3
>            Reporter: Benoit Perroud
>            Priority: Minor
>             Fix For: 0.8.6
>
>         Attachments: CachedSizeCF.patch
>
>
> Every time newRow is called, serializedSize iterate through all the columns to compute
the size.
> Once 1'000'000 columns exist in the CF, it becomes painfull to do at every iteration
the same computation. Caching the size and incrementing when a Column is added could be an
option.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message