cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua McKenzie (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)
Date Tue, 07 Jul 2015 19:27:09 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14617221#comment-14617221
] 

Joshua McKenzie commented on CASSANDRA-6477:
--------------------------------------------

h6. Overall:
Needs comments. Class level, javadoc where appropriate, etc.

h6. {{CFMetaData}}
* duplicate entry for comparison if triggers added in CFMetadata.triggers
* nit: space after CFMetaData.getMaterializedViews function end

h6. {{MaterializedViewManager}}
* buildIfRequired: scope of whether it's required or skipped isn't in this method or within
view.build(), so it looks like it's just building (or at least submitting the builder to the
CompactionManager)
* You're using SystemKeyspace.setIndexRemoved to mark the MV removed in removeMaterializedView
but not the parallel setIndexBuilt to set them as built.
* allViews has unnecessary genericity
* nits:
** Some unused imports
** Consider renaming reload to indicate what it's reloading (e.g. reloadViews, reloadChickens,
reloadDolphins)

h6. {{MaterializedView}}
* createForDeletionInfo:
** Consider caching/precomputing the results of CFMetaData.hasComplexColumns rather than iterating
through them twice on each call (once on CFMetaData.hascomplexColumns, again in for loop in
createForDeletionInfo)
*** Perhaps hooking into addOrReplaceColumnDefinition, removeColumnDefinition and updating
MV's with the data, keep a set inside MaterializedView and reference that
* Don't need MaterializedView.reload() as it's just a passthrough to build and used in 1 place
* Tighten up scoping on member variables - some package private that can be explicitly private
* ctor: Duplication in the ctor in treatment of MVDefinition partition and clustering columns
- could use a slight refactor to a function.
* createTombstone / createComplexTombstone: refactor out building viewClusteringValues into
method to remove duplication
* targetPartitionKey: collapse spacing on:
{noformat}
return viewCfs
   .partitioner
   .decorateKey(CFMetaData
                  .serializePartitionKey(viewCfs
                                       .metadata
                                       .getKeyValidatorAsClusteringComparator()
                                       .make(partitionKey)));
{noformat}
to something like:
{noformat}
return viewCfs.partitioner.decorateKey(CFMetaData.serializePartitionKey(viewCfs.metadata
                                                                        .getKeyValidatorAsClusteringComparator()
                                                                        .make(partitionKey)));
{noformat}
* createPartitionTombstonesForUpdates: Can simply return mutation @ end of function
* createForDeletionInfo: unused parameter consistency
* Inconsistent naming - using mutationUnits scattered throughout vs. LiveRowState
* Rename {{query}} to {{queryMaterializedViewData}} or something similar that denotes what
you're querying. Perhaps {{buildMVData}}
* nits:
** Some unused imports
** ctor: Change nonPrimaryKeyCol to allPrimaryKeyCol as true, flip to false when non found
so we're not assigning a double-negative to targetHasAllPrimaryKeyColumns
** extra space in MaterializedView.createForDeletionInfo
** ctor declaration fits on 1 line < 120 char, same for some other lines that have been
wrapped
** hte in javadoc for targetPartitionKey

h6. {{MaterializedViewDefinition}}
* Consider caching sets of columns in the MV rather than pulling from CFMetaData and iterating.
This would allow for faster lookup and simpler code than having to iterate across all members
(MaterializedViewDefinition.selects for instance).
* Consider using Sets of columns rather than Lists internally, would remove a lot of O(N)
lookups and duplicate code logic scattered throughout the class (selects, renameColumn, etc)
* selects(): if included.isEmpty() is apparently an indicator that it includes all. We should
1) comment that and 2) wrap a method around that behavior, e.g. 'boolean isAllInclusive()'
that documents that behavior in the code.
* nits:
** I prefer copy constructors to a .copy() method and I think I've seen us err on the side
of the copy constructor. I could be wrong here.
** assert included != null and !isEmpty on ctor
** extra whitespace in renameColumn @ top of method, @ end of method

h6. {{LiveRowState}}
* The name conflates with LivenessInfo (at least for me) which we discussed on IRC. My biggest
hurdle to grokking this class was un-plugging that and re-plugging the idea that it's really
about materializing the data for the MaterializedView. Perhaps rename to something like {{TemporalRow}},
with {{LRSCell}} becoming {{TemporalCell}}? While LRSCell doesn't really have any temporal
data above and beyond a general Cell, it does contain its own {{reconcile}} that takes temporality
into account which makes sense.
* Instead of {{addUnit}} / {{getExistingUnit}} in LiveRowState.Set, if going with the above
these could be named {{addRow}}, {{getExistingRow}}
* Think we can get rid of getInternedUnit entirely and just have a section in addUnit to initialize
empty entries
* Consider having LRSCell implement db.rows.Cell and adding a {{Conflicts.resolveCells(Cell
left, Cell right, int nowInSec)}} method that passes through to {{Conficts.resolveRegular}}.
Simplifies up a few different users of the method throughout the code-base.
* Can do away w/PrivateResolver vs. Resolver. LRSCell being private should prevent non-class
anonymous instantiations of the interface.
* Various unused methods and constructors
* Comment on whether or not retention of ordering is important in {{baseSlice}} as you could
do away w/interim ByteBuffer array if not. I believe it is, but worth clarifying.
* nit: whitespace inconsistencies in {{values(...)}}

I definitely have more to review but figured I'd get the feedback I have thus far into here.
I'm considering doing subsequent reviews as a branch I push to github w/comments inline (similar
to what Benedict's been doing lately) with feedback as putting this quantity of feedback in
Jira comments is a burden for both you and I - thoughts?

> Materialized Views (was: Global Indexes)
> ----------------------------------------
>
>                 Key: CASSANDRA-6477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.0 beta 1
>
>         Attachments: test-view-data.sh
>
>
> Local indexes are suitable for low-cardinality data, where spreading the index across
the cluster is a Good Thing.  However, for high-cardinality data, local indexes require querying
most nodes in the cluster even if only a handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message