cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6934) Optimise Byte + CellName comparisons
Date Wed, 23 Apr 2014 17:21:18 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13978488#comment-13978488
] 

Sylvain Lebresne commented on CASSANDRA-6934:
---------------------------------------------

h6. 1st patch (NEG_INF/POS_INF):

* We need to make sure that we don't use Composites.EMPTY in slices (at least for the finish
bound), but that's still done for thrift and CQL2 due to the fromByteBuffer calls in CassandraServer#toInternalFilter,
cql.QueryProcess#{getSlice,filterFromSelect} and ThriftValidation#asIFilter.
* When the SliceQueryFilter is reversed, a slice start bound should compare after it's finish,
so we should use slice(POS_INF, NEG_INF) in that case. So I think all the POS_INF in SelectStatement
should be NEG_INF, and more generally mind reversed in every case (in the thrift/CQL2 methods
above, in SuperColumns, in SystemKeyspace, ...).  I think that also mean we need a ColumnSlice.ALL_COLUMNS_REVERSED,
or at least check we never use ALL_COLUMNS/ALL_COLUMNS_ARRAY in reversed slice (haven't checked).
* I'd feel much more comfortable we're not missing some case if we added an assertion/check
in SliceQueryFilter that we never use Composites.EMPTY as finish bound (or start bound if
reversed). Ideally we'd even refuse it for start bound (finish when reversed), but I think
that would require to fix a bunch of place where we don't use Composite#start() (which is
somewhat wrong in the first place but well) for the start bound (there's at least SelectStatement#buildBound
but likely others).
* In ColumnSlice#isAlwaysEmpty, I don't think we need to test {{!start.isEmpty()}} anymore
(and since the method handle both forward and reversed slices, testing only one of {{start}}
or {{finish}} actually feels wrongs).
* Nit: In AbstractCType.compare, I'd remove "only one can have an EOC" because I think the
code works even if that's not the case and I'd rather not make that an "official" assumption
(it's probably always true, but there is no point relying on it).
* Nit: many comparators test their arguments for {{null}} which I'm pretty sure is not necessary
(it better be since a handful of comparator don't do it). Could be nice to remove this from
all comparator while we're removing useless checks (or let say at least for consistency).
* Nit: A small comment on what prefixComparisonResult is would be useful, it's not obvious
without going looking for how it's used.
* Nit: Slightly uncomfortable relying on EOC enum ordinal() (I'll grant that there is little
reason to ever reorder than enum, but still feel like a bad habit). I'd prefer adding a specific
field.
* Nit: Unused imports in SimpleDenseCellNameType/SimpleSparseCellnameType
* Nit: No newline between a field and a ctor is not very consistent with the code base :)


h6. 2nd patch (BTree change):

Looks good to me, though I've found it harder that needs to in NodeBuilder.update() to convince
myself that POSITIVE_INFINITY wasn't ending up in a call to addNewKey/replaceNewKey. I'd prefer
special casing it entirely: it's a very small code duplication for imo more clarity. And in
general I've found the beginning of that method a tad hard to follow ('set i negative, so
that found is false and then negate back i right away' is a tad contrived, as well as piecing
together when owns is false exactly). Anyway, I've pushed a small commit with a suggestion
(which "I think" is equivalent but feels clearer) at https://github.com/pcmanus/cassandra/commits/6934.

h6. 3rd patch (isByteOrderComparable):

* Int32Type, LongType and TimestampType are not byteOrderComparable if I'm not mistaken (because
they are signed). I don't CompositeType is either unless it has only one element, even if
all it's element are byteOrderComparable (because elements are not guaranteed to have the
same length).
* Can't we move the isByteOrderComparable field in AbstractCType and use it in it's compare,
which I think remove the need for adding one in AbstractCompoundCellNameType.
* In CompoundSparseCellNameType ctor, the collection type isn't taken into account, we should
iterate over {{fullType}} (which includes columnNameType too as it happens).

* Nit: in the added compare, having {{i}} declared outside of the loops looks odd (style-wise)
to me. Are you attached to that?


I'll note that the branch breaks a lot of the cql dtests (cql_tests.py), so probably a good
idea to check those to make some obvious case hasn't been missed.


> Optimise Byte + CellName comparisons
> ------------------------------------
>
>                 Key: CASSANDRA-6934
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6934
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 2.1.0
>
>
> AbstractCompositeType is called a lot, so deserves some heavy optimisation. SimpleCellNameType
can be optimised easily, but should explore other potential optimisations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message