cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Yaskevich (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11130) [SASI Pre-QA] = semantics not respected when using StandardAnalyzer
Date Mon, 08 Feb 2016 03:16:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15136570#comment-15136570
] 

Pavel Yaskevich commented on CASSANDRA-11130:
---------------------------------------------

||branch||testall||dtest||
|CASSANDRA-11130|[testall|http://cassci.datastax.com/job/xedin-CASSANDRA-11130-testall/]|[dtest|http://cassci.datastax.com/job/xedin-CASSANDRA-11130-dtest/]|


> [SASI Pre-QA] = semantics not respected when using StandardAnalyzer
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-11130
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11130
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>         Environment: Tested from build [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]
>            Reporter: DOAN DuyHai
>            Assignee: Pavel Yaskevich
>             Fix For: 3.4
>
>
> Tested from build [CASSANDRA-11067|https://issues.apache.org/jira/browse/CASSANDRA-11067]
> {code:sql}
> CREATE KEYSPACE music WITH replication = {'class': 'SimpleStrategy', 'replication_factor':
'1'}  AND durable_writes = true;
> CREATE TABLE music.albums (
>     id int PRIMARY KEY,
>     artist text,
>     title1 text,
>     title2 text
> );
> CREATE CUSTOM INDEX ON music.albums (title1) USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {'tokenization_skip_stop_words': 'true', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'case_sensitive': 'false', 'mode': 'PREFIX', 'tokenization_enable_stemming': 'true'};
> CREATE CUSTOM INDEX ON music.albums (title2) USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {'tokenization_skip_stop_words': 'true', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'case_sensitive': 'false', 'mode': 'CONTAINS', 'tokenization_enable_stemming': 'true'};
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(1, 'Superpitcher', 'Yesterday', 'Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(2, 'Hilary Duff', 'So Yesterday', 'So Yesterday');
> INSERT INTO music.albums(id, artist, title1, title2) 
> VALUES(3, 'The Mr. T Experience', 'Yesterday Rules', 'Yesterday Rules');
> SELECT artist,title1 FROM music.albums WHERE title1='Yesterday';
>  artist                 | title1
> ------------------------+----------------
>            Superpitcher |       Yesterday
>             Hilary Duff |    So Yesterday
>    The Mr. T Experience | Yesterday Rules
>  
> (3 rows)
> SELECT artist,title1 FROM music.albums WHERE title2='Yesterday';
> artist                 | title1
> ------------------------+----------------
>            Superpitcher |       Yesterday
>             Hilary Duff |    So Yesterday
>    The Mr. T Experience | Yesterday Rules
>   
> (3 rows)
> {code}
> The semantic of *=* is not respected. SASI should return only 1 row with exact match.
Using *LIKE* would return all 3 rows. It does impact both *PREFIX* and *CONTAINS* mode. Using
*NonTokenizerAnalyzer* return 1 row with exact match.
>  So indeed, the semantics of *=* depends on the chosen analyzer, which is inconsistent.
We should force *=* to be exact match no matter which analyzer is chosen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message