cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: [DISCUSS] CEP-7 Storage Attached Index
Date Tue, 18 Aug 2020 11:02:31 GMT
Thank you Zhao Yang for starting this topic

After reading the short design doc, I have a few questions

1) SASI was pretty inefficient indexing wide partitions because the index
structure only retains the partition token, not the clustering colums. As
per design doc SAI has row id mapping to partition offset, can we hope that
indexing wide partition will be more efficient with SAI ? One detail that
worries me is that in the beggining of the design doc, it is said that the
matching rows are post filtered while scanning the partition. Can you
confirm or infirm that SAI is efficient with wide partitions and provides
the partition offsets to the matching rows ?

2) About space efficiency, one of the biggest drawback of SASI was the huge
space required for index structure when using CONTAINS logic because of the
decomposition of text columns into n-grams. Will SAI suffer from the same
issue in future iterations ? I'm anticipating a bit

3) If I'm querying using SAI and providing complete partition key, will it
be more efficient than querying without partition key. In other words, does
SAI provide any optimisation when partition key is specified ?

Regards

Duy Hai DOAN

Le mar. 18 août 2020 à 11:39, Mick Semb Wever <mck@apache.org> a écrit :

> >
> > We are looking forward to the community's feedback and suggestions.
> >
>
>
> What comes immediately to mind is testing requirements. It has been
> mentioned already that the project's testability and QA guidelines are
> inadequate to successfully introduce new features and refactorings to the
> codebase. During the 4.0 beta phase this was intended to be addressed, i.e.
> defining more specific QA guidelines for 4.0-rc. This would be an important
> step towards QA guidelines for all changes and CEPs post-4.0.
>
> Questions from me
>  - How will this be tested, how will its QA status and lifecycle be
> defined? (per above)
>  - With existing C* code needing to be changed, what is the proposed plan
> for making those changes ensuring maintained QA, e.g. is there separate QA
> cycles planned for altering the SPI before adding a new SPI implementation?
>  - Despite being out of scope, it would be nice to have some idea from the
> CEP author of when users might still choose afresh 2i or SASI over SAI,
>  - Who fills the roles involved? Who are the contributors in this DataStax
> team? Who is the shepherd? Are there other stakeholders willing to be
> involved?
>  - Is there a preference to use gdoc instead of the project's wiki, and
> why? (the CEP process suggest a wiki page, and feedback on why another
> approach is considered better helps evolve the CEP process itself)
>
> cheers,
> Mick
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message