cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] Commented: (CASSANDRA-1938) Use UUID as node identifiers in counters instead of IP addresses
Date Tue, 08 Mar 2011 08:51:59 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003863#comment-13003863
] 

Sylvain Lebresne commented on CASSANDRA-1938:
---------------------------------------------

bq. Is there anyway to deal with this by tracking the CounterColumns in -Statistics.db?

We don't even have to go there. We can have special casing for counter CFs if need be. But
it won't be super pretty. In particular, you'll have to special case both PreCompactedRow
and LazyCompactedRow. Because the main point is to use the columns updateDigest() function
to compute the preCompactedRow digests instead of using the raw bytes. If you special case
here, you'll have to mirror this in the lazy case, that is, use the columns updateDigest in
the counter case, but the raw bytes otherwise.

However, what I'm trying to say is that I'm not super convinced that the echoRow 'optimisation'
is really that useful anymore (I'm not talking about cleanup where echoing is useful and not
changed by this patch). But during compaction, we echo a row if it's only in one of the sstable
we're compacting *but* it also exists in a sstable we are not compacting (otherwise we still
deserialize for tombstone reclaiming). I would imagine we have either rows that are often
updated (in which case it will be rare to have only 1 of the sstable we compact containing
it) or rows that are barely updated (in which case we'll still deserialize for tombstone reclaiming
most of the time).

I'm also not sure we'll keep this 'optimization' forever anyway. If we add checksums for example
(which we should imho, sooner than later), echoing data may not be desirable.

So given all this and given that even for the (I believe) rare cases where it is useful, it
is not in a critical path, I'd advise against polluting the code for this.

That being said, if I'm the only one to feel that way, it's doable.


> Use UUID as node identifiers in counters instead of IP addresses 
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-1938
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1938
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 0.8
>
>         Attachments: 0001-Use-uuid-instead-of-IP-for-counters.patch, 0002-Merge-old-shard-locally.patch,
0003-Thrift-change-to-CfDef.patch, 1938_discussion
>
>   Original Estimate: 56h
>  Remaining Estimate: 56h
>
> The use of IP addresses as node identifiers in the partition of a given
> counter is fragile. Changes of the node's IP addresses can result in data
> loss. This patch proposes to use UUIDs instead.
> NOTE: this breaks the on-disk file format (for counters)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message