cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jordan West (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10661) Integrate SASI to Cassandra
Date Sat, 23 Jan 2016 19:43:40 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113918#comment-15113918
] 

Jordan West edited comment on CASSANDRA-10661 at 1/23/16 7:42 PM:
------------------------------------------------------------------

bq. Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as
if by enabling prefix or contains, that it will always query by prefix or contains. For example,
if I want to query for full first name, like where their full first name really is "J" and
not get "John" and "James" as well, while at other times I am indeed looking for names starting
with a prefix of "Jo" for "John", "Joseph", etc.

The example is correct, but this is not a limitation of SASI, its a limitation in CQL, and
we decided not to further extend the grammar, since we have already had to scale back our
grammar changes to later phases (removing OR, grouping, and != support for now). Ideally,
`=` would mean exact match and CQL would support a `LIKE` operator similar to SQL, and depending
on if the index was created with `PREFIX` or `CONTAINS` we would allow/disallow forms such
as `%Jo%` or `_j%`. 

bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would
I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra
non-custom index work fine (or even better.)

It does, but so are all queries on numerical data, which thinking about it, may make the `PREFIX`
option confusing for numeric types. SPARSE is intended to improve query performance on numerical
data where there are a large number of terms (e.g. timestamps), but small number of keys per
term (e.g. some timeseries data).  `SPARSE` should not be used on every numerical column,
and for most non-numerical data is not an ideal setting either. For example, in a large data
set of first names the number of names will be small compared to the number of keys, and given
the distribution of first names using SPARSE will increase the size of the index and at best
have zero effect on query performance, but may hurt it.





 


was (Author: jrwest):
bq. Is there also a way to query a SASI-indexed column by exact value? I mean, it seems as
if by enabling prefix or contains, that it will always query by prefix or contains. For example,
if I want to query for full first name, like where their full first name really is "J" and
not get "John" and "James" as well, while at other times I am indeed looking for names starting
with a prefix of "Jo" for "John", "Joseph", etc.

The example is correct, but this is not a limitation of SASI, its a limitation in CQL, and
we decided not to further extend the grammar, since we have already had to scale back our
grammar changes to later phases (removing OR, grouping, and != support for now). Ideally,
CQL would support a `LIKE` operator similar to SQL, and depending on if the index was created
with `PREFIX` or `CONTAINS` we would allow/disallow forms such as `%Jo%` or `_j%`. 

bq. Will SPARSE mode in fact give me an exact match? (Sounds like it.) In which case, would
I be better off with a SPARSE index for first_name_full, or would a traditional Cassandra
non-custom index work fine (or even better.)

It does, but so are all queries on numerical data, which thinking about it, may make the `PREFIX`
option confusing for numeric types. SPARSE is intended to improve query performance on numerical
data where there are a large number of terms (e.g. timestamps), but small number of keys per
term (e.g. some timeseries data).  `SPARSE` should not be used on every numerical column,
and for most non-numerical data is not an ideal setting either. For example, in a large data
set of first names the number of names will be small compared to the number of keys, and given
the distribution of first names using SPARSE will increase the size of the index and at best
have zero effect on query performance, but may hurt it.





 

> Integrate SASI to Cassandra
> ---------------------------
>
>                 Key: CASSANDRA-10661
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10661
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths
>            Reporter: Pavel Yaskevich
>            Assignee: Pavel Yaskevich
>              Labels: sasi
>             Fix For: 3.x
>
>
> We have recently released new secondary index engine (https://github.com/xedin/sasi)
build using SecondaryIndex API, there are still couple of things to work out regarding 3.x
since it's currently targeted on 2.0 released. I want to make this an umbrella issue to all
of the things related to integration of SASI, which are also tracked in [sasi_issues|https://github.com/xedin/sasi/issues],
into mainline Cassandra 3.x release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message