cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: How to query '%' character using LIKE operator in Cassandra 3.7?
Date Wed, 14 Sep 2016 07:42:06 GMT
Ok you're right, I get your point

LIKE '%%esc%' --> startWith('%esc')

LIKE 'escape%%' -->  = 'escape%'

What I strongly suspect is that in the source code of SASI, we parse the %
xxx % expression BEFORE applying escape. That will explain the observed
behavior. E.g:

LIKE '%%esc%'  parsed as %xxx% where xxx = %esc

LIKE 'escape%%' parsed as xxx% where xxx =escape%

Let me check in the source code and try to reproduce the issue



On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
mikhail.krupitskiy@jetbrains.com> wrote:

> Looks like we have different understanding of what results are expected.
> I based my understanding on http://docs.datastax.com/
> en/cql/3.3/cql/cql_using/useSASIIndex.html
> According to the doc ‘esc’ is a pattern for exact match and I guess that
> there is no semantical difference between two LIKE patterns (both of
> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>
> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
> *containing* '%esc' so *%esc*apeme is a possible match and also escape
> *%esc*
>
> Why ‘containing’? I expect that it should be ’starting’..
>
>
> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
> *starting* with 'escape%' so *escape%*me is a valid result and also
> *escape%*esc
>
> Why ’starting’? I expect that it should be ‘exact matching’.
>
> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’ but it
> returns nothing (CASSANDRA-12573).
>
> What I’m missing?
>
> Thanks,
> Mikhail
>
> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduyhai@gmail.com> wrote:
>
> CREATE CUSTOM INDEX ON test.escape(val) USING 'org.apache.cassandra.index.sasi.SASIIndex'
> WITH OPTIONS = {'mode': 'CONTAINS', 'analyzer_class':
> 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
> 'case_sensitive': 'false'};
>
> I don't see any problem in the results you got
>
> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
> *containing* '%esc' so *%esc*apeme is a possible match and also escape
> *%esc*
>
> Why ‘containing’? I expect that it should be ’starting’..
>
>
> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
> *starting* with 'escape%' so *escape%*me is a valid result and also
> *escape%*esc
>
> Why ’starting’? I expect that it should be ‘exact matching’.
>
>
> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
> mikhail.krupitskiy@jetbrains.com> wrote:
>
>> Thanks for the reply.
>> Could you please provide what index definition did you use?
>> With the index from my script I get the following results:
>>
>> cqlsh:test> select * from escape;
>>
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>>   2 | escape%me
>> *  3 | escape%esc*
>>
>> Contains search
>>
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>>   3
>> * | escape%esc*(2 rows)
>>
>>
>> Prefix search
>>
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>
>>  id | val
>> ----+-----------
>>   2 | escape%me
>>   3
>> * | escape%esc*
>>
>> Thanks,
>> Mikhail
>>
>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduyhai@gmail.com> wrote:
>>
>> Use % to escape %
>>
>> cqlsh:test> select * from escape;
>>
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>>   2 | escape%me
>>
>>
>> Contains search
>>
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>
>>  id | val
>> ----+-----------
>>   1 | %escapeme
>>
>> (1 rows)
>>
>>
>> Prefix search
>>
>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>
>>  id | val
>> ----+-----------
>>   2 | escape%me
>>
>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>> mikhail.krupitskiy@jetbrains.com> wrote:
>>
>>> Hi Cassandra guys,
>>>
>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple char in a
>>> search pattern.
>>> Here is my test script:
>>>
>>> DROP keyspace if exists kmv;
>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class' :
>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>> USE kmv;
>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY
>>> KEY(id, c1));
>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>>> 'analyzed' : 'true',
>>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>>> si.analyzer.NonTokenizingAnalyzer',
>>> 'case_sensitive' : 'false',
>>> 'mode' : 'CONTAINS'
>>> };
>>>
>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>
>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>
>>> _pattern_ '%%%' finds all columns that contain %.
>>> How to find columns that start form ‘%’ or ‘%a’?
>>> How to find columns that end with ‘%’?
>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate ‘%’
>>> char form % as a command symbol? (Also there is a related issue
>>> CASSANDRA-12573).
>>>
>>>
>>> Thanks,
>>> Mikhail
>>
>>
>>
>>
>
>

Mime
View raw message