cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10045) Sparse/Dense decision should be made per-row, not per-file
Date Mon, 24 Aug 2015 14:10:45 GMT


Sylvain Lebresne commented on CASSANDRA-10045:

I like the idea. A few remarks on the patch:
* In {{Columns}}, since we introduce a {{size()}} (which I'm good with), I'd prefer removing
* {{EncodingStats.avgColumnSetPerRow}} isn't used anymore (if I'm not mistaken) so we should
remove it.
* In {{Row}}, I don't like having {{columns()}} return a {{Columns}} but {{actualColumns()}}
return a {{Collection<ColumnDefinition>}} (the latter method is in need of a javadoc
also). It's an important API and API consistency is important so I'd prefer one of (with a
minor preference for the 2nd one for the sake of keeping the {{Row}} interface smaller):
  ## make {{actualColumns}} return a {{Columns}} and pay the allocation.
  ## don't add that method and just call {{Collections2.transform(row, ColumnData::column)}}
at the call sites.
* In {{BufferCell.Serializer}}, can we "rebase" the "MASK" (from 0x01). The patch is breaking
serialization anyway.
* In {{UnfilteredSerializer}} class javadoc, the bullet about complex cells starts by "while
each ...", where the "while" should be removed (bad copy-paste). More importantly, the description
is not up-to-date: we now encode the number of cells up-front and there is no {{<emptyCell>}}
at the end.
* Any reason for changing the name of the row inserted in {{ScrubTest}}? Doesn't seem related
to this patch in any way.
* Nit: in {{UnfilteredSerializer.readComplexColumn/skipComplexColumn}}, a {{for}} loop would
be more idiomatic (of the code base if not of Java).

> Sparse/Dense decision should be made per-row, not per-file
> ----------------------------------------------------------
>                 Key: CASSANDRA-10045
>                 URL:
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Minor
>             Fix For: 3.0 beta 2
> Marking this as beta 1 in the hope I have time to rustle it up and get it reviewed beforehand.
If I do not, I will let it slide, but our behaviour right now is not brilliant for workloads
with a variance in density, and it should not be challenging to make a more targeted decision.
> We can also make use of CASSANDRA-9894 to make column encoding more efficient in many,
even dense, cases.

This message was sent by Atlassian JIRA

View raw message