cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6936) Make all byte representations of types comparable by their unsigned byte representation only
Date Thu, 27 Mar 2014 11:06:15 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949150#comment-13949150
] 

Benedict commented on CASSANDRA-6936:
-------------------------------------

bq. Or I guess we could have some conversion of representation when receiving/sending values

I would settle for conversion when reading/writing from disk for these, but at send/receive
would be best, so that we can benefit from the changes in memory as well. But our on-disk
indexing is currently quite lacking, and improving that would be a tremendous help by itself.

bq. I don't see an easy way to have a bytes comparable representation of say IntegerType (since
it's variable length)

[http://www.dlugosz.com/ZIP2/VLI.html] looks to be one pretty simple such encoding, but there
are others

bq.  there is the custom types

This is more of an issue. DecimalType is also tricky (though still achievable I'm sure). It
_may_ be that we have a slow fallback for those types we decide are too problematic to convert,
but it would be good to aim for a situation where we can have a fast route, and where we can
make on-disk optimisations. In an ideal world, though, we would simply not support indexing
(clustering/naming) on fields that can't be given this property (which is probably very few,
and probably not a major limitation).

bq.  I'm rather uncomfortable with doing complex bit manipulations of the user data... And
since we do return that representation to the user, it's not like we can change it to whatever
suits us

I'm not sure your rationale for this. It seems an arbitrary distinction from all of the other
complex things we do to user data. All we do is shuffle around/encode/wrap user data. This
is exactly the kind of thing a database is supposed to do to make the user's life easier,
and in this event _we chose_ the encoding, so the user has no specific attachment to it. We
could easily create new types that require no conversion, and encourage users to switch for
safety/efficiency, but so long as any conversion is lossless, it shouldn't be a problem. 

Investigating this has raised another related issue, which is that I only now realised we
store a 4-byte length for every single value. This seems immensely wasteful, and at the same
time as any of these changes we should push this logic into AbstractType, so that those that
are fixed length, or only need a short length, or can otherwise encode their length, can decide
for themselves what size length to write.

> Make all byte representations of types comparable by their unsigned byte representation
only
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6936
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6936
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>
> This could be a painful change, but is necessary for implementing a trie-based index,
and settling for less would be suboptimal; it also should make comparisons cheaper all-round,
and since comparison operations are pretty much the majority of C*'s business, this should
be easily felt (see CASSANDRA-6553 and CASSANDRA-6934 for an example of some minor changes
with major performance impacts). No copying/special casing/slicing should mean fewer opportunities
to introduce performance regressions as well.
> Since I have slated for 3.0 a lot of non-backwards-compatible sstable changes, hopefully
this shouldn't be too much more of a burden.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message