cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "DOAN DuyHai (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Deleted] (CASSANDRA-12573) SASI index. Incorrect results for '%foo%bar%'-like search pattern.
Date Thu, 15 Sep 2016 16:16:21 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-12573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

DOAN DuyHai updated CASSANDRA-12573:
------------------------------------
    Comment: was deleted

(was: Right, the escaping issue does not matter here. What we want to understand is how SASI
interprets the {{%}} in the middle of the term.

Please note that you're using C* 3.7. I have contributed a bug fix (that was scheduled for
3.9 and is in trunk) about skip stop words being applied after stemming whereas it should
be applied before. I'm not sure if it is relevant to the current data set here but it rings
a bell in my head when you get weird behaviors only when using StandardAnalyzer)

> SASI index. Incorrect results for '%foo%bar%'-like search pattern. 
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-12573
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12573
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Mikhail Krupitskiy
>            Assignee: Alex Petrov
>            Priority: Critical
>              Labels: sasi
>
> We use Cassandra 3.7 and have faced a strange behaviour of SELECT requests with "LIKE
'%foo%bar%'" constraints on a column with SASI index.
> Below are few experiments that show this behaviour.
> Experiment 1:
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'}
;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
>  'mode': 'CONTAINS'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: no rows.
> Experiment 2 (NOTE: definition of index is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'}
;
> use kmv;
> CREATE TABLE if not exists kmv (id int primary key, c1 text, c2 text);
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (2, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (3, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (4, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (5, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: asdqwe, qweasd, qwea1.
> Experiment 3 (NOTE: primary key is compound now and inserted data was changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'}
;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w%a%';
> {noformat}
> Expected result: qweasd, qwea1.
> Actual result: qwe, qweasd, qwea1, 1qwe, asdqwe.
> Experiment 4 (NOTE: search criteria is changed):
> {noformat}
> drop keyspace if exists kmv;
> create keyspace if not exists kmv WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor':'1'}
;
> use kmv;
> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY KEY(id, c1));
> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
>  'mode': 'CONTAINS',
>  'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
>  'analyzed': 'true'
> };
> insert into kmv (id, c1, c2) values (1, 'f21', 'qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f22', 'qweasd') ;
> insert into kmv (id, c1, c2) values (1, 'f23', 'qwea1') ;
> insert into kmv (id, c1, c2) values (1, 'f24', '1qwe') ;
> insert into kmv (id, c1, c2) values (1, 'f25', 'asdqwe') ;
> select c2 from kmv.kmv where c2 like '%w22%a%';
> {noformat}
> Expected result: no rows.
> Actual result: qweasd, qwea1, asdqwe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message