cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DuyHai Doan <doanduy...@gmail.com>
Subject Re: How to query '%' character using LIKE operator in Cassandra 3.7?
Date Sat, 24 Sep 2016 18:09:52 GMT
Reminder, right now, the % character is only interpreted as wildcard IF AND
ONLY IF it is the first/last character of the searched term


LIKE '%escape' --> ENDS WITH 'escape'

If we use % to escape %,
LIKE '%%escape' -->  EQUALS TO '%escape'

LIKE '%%%escape' --> EQUALS TO '%%escape' ???




On Fri, Sep 23, 2016 at 5:02 PM, Mikhail Krupitskiy <
mikhail.krupitskiy@jetbrains.com> wrote:

> Hi, Jim,
>
> What pattern should be used to search “ends with  ‘%escape’ “ with your
> conception?
>
> Thanks,
> Mikhail
>
> On 22 Sep 2016, at 18:51, Jim Ancona <jim@anconafamily.com> wrote:
>
> To answer DuyHai's question without introducing new syntax, I'd suggest:
>
> LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape'
>
> So the first two %'s are translated to a literal, non-wildcard % and the
> third % is a wildcard because it's not doubled.
>
> Jim
>
> On Thu, Sep 22, 2016 at 11:40 AM, Mikhail Krupitskiy <
> mikhail.krupitskiy@jetbrains.com> wrote:
>
>> I guess that it should be similar to how it is done in SQL for LIKE
>> patterns.
>>
>> You can introduce an escape character, e.g. ‘\’.
>> Examples:
>> ‘%’ - any string
>> ‘\%’ - equal to ‘%’ character
>> ‘\%foo%’ - starts from ‘%foo’
>> ‘%%%escape’ - ends with ’escape’
>> ‘\%%’ - starts from ‘%’
>> ‘\\\%%’ - starts from ‘\%’ .
>>
>> What do you think?
>>
>> Thanks,
>> Mikhail
>>
>> On 22 Sep 2016, at 16:47, DuyHai Doan <doanduyhai@gmail.com> wrote:
>>
>> Hello Mikhail
>>
>> It's more complicated that it seems
>>
>> LIKE '%%escape' means  EQUAL TO '%escape'
>>
>> LIKE '%escape' means ENDS WITH 'escape'
>>
>> What's about LIKE '%%%escape' ????
>>
>> How should we treat this case ? Replace %% by % at the beginning of the
>> searched term ??
>>
>>
>>
>> On Thu, Sep 22, 2016 at 3:31 PM, Mikhail Krupitskiy <
>> mikhail.krupitskiy@jetbrains.com> wrote:
>>
>>> Hi!
>>>
>>> We’ve talked about two items:
>>> 1) ‘%’ as a wildcard in the middle of LIKE pattern.
>>> 2) How to escape ‘%’ to be able to find strings with the ‘%’ char with
>>> help of LIKE.
>>>
>>> Item #1was resolved as CASSANDRA-12573.
>>>
>>> Regarding to item #2: you said the following:
>>>
>>> A possible fix would be:
>>>
>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>>> the column data type)
>>> 2) remove the escape character e.g. before parsing OR use some advanced
>>> regex to exclude the %% from parsing e.g
>>>
>>> Step 2) is dead easy but step 1) is harder because I don't know if
>>> converting the bytebuffer into String at this stage of the CQL parser is
>>> expensive or not (in term of computation)
>>>
>>> Let me try a patch
>>>
>>> So is there any update on this?
>>>
>>> Thanks,
>>> Mikhail
>>>
>>>
>>> On 20 Sep 2016, at 18:38, Mikhail Krupitskiy <
>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>
>>> Hi!
>>>
>>> Have you had a chance to try your patch or solve the issue in an other
>>> way?
>>>
>>> Thanks,
>>> Mikhail
>>>
>>> On 15 Sep 2016, at 16:02, DuyHai Doan <doanduyhai@gmail.com> wrote:
>>>
>>> Ok so I've found the source of the issue, it's pretty well hidden
>>> because it is NOT in the SASI source code directly.
>>>
>>> Here is the method where C* determines what kind of LIKE expression
>>> you're using (LIKE_PREFIX , LIKE CONTAINS or LIKE_MATCHES)
>>>
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>>> apache/cassandra/cql3/restrictions/SingleColumnRestriction.j
>>> ava#L733-L778
>>>
>>> As you can see, it's pretty simple, maybe too simple. Indeed, they
>>> forget to remove escape character BEFORE doing the matching so if your
>>> search is LIKE '%%esc%', the detected expression is LIKE_CONTAINS.
>>>
>>> A possible fix would be:
>>>
>>> 1) convert the bytebuffer into plain String (UTF8 or ASCII, depending on
>>> the column data type)
>>> 2) remove the escape character e.g. before parsing OR use some advanced
>>> regex to exclude the %% from parsing e.g
>>>
>>> Step 2) is dead easy but step 1) is harder because I don't know if
>>> converting the bytebuffer into String at this stage of the CQL parser is
>>> expensive or not (in term of computation)
>>>
>>> Let me try a patch
>>>
>>>
>>>
>>> On Wed, Sep 14, 2016 at 9:42 AM, DuyHai Doan <doanduyhai@gmail.com>
>>> wrote:
>>>
>>>> Ok you're right, I get your point
>>>>
>>>> LIKE '%%esc%' --> startWith('%esc')
>>>>
>>>> LIKE 'escape%%' -->  = 'escape%'
>>>>
>>>> What I strongly suspect is that in the source code of SASI, we parse
>>>> the % xxx % expression BEFORE applying escape. That will explain the
>>>> observed behavior. E.g:
>>>>
>>>> LIKE '%%esc%'  parsed as %xxx% where xxx = %esc
>>>>
>>>> LIKE 'escape%%' parsed as xxx% where xxx =escape%
>>>>
>>>> Let me check in the source code and try to reproduce the issue
>>>>
>>>>
>>>>
>>>> On Tue, Sep 13, 2016 at 7:24 PM, Mikhail Krupitskiy <
>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>
>>>>> Looks like we have different understanding of what results are
>>>>> expected.
>>>>> I based my understanding on http://docs.datastax.com/en
>>>>> /cql/3.3/cql/cql_using/useSASIIndex.html
>>>>> According to the doc ‘esc’ is a pattern for exact match and I guess
>>>>> that there is no semantical difference between two LIKE patterns (both
of
>>>>> patterns should be treated as ‘exact match'): ‘%%esc’ and ‘esc’.
>>>>>
>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>>>> *%esc*
>>>>>
>>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>>
>>>>>
>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>>>> *escape%*esc
>>>>>
>>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>>
>>>>> Also I expect that “ LIKE ‘%s%sc%’ ” will return ‘escape%esc’
but it
>>>>> returns nothing (CASSANDRA-12573).
>>>>>
>>>>> What I’m missing?
>>>>>
>>>>> Thanks,
>>>>> Mikhail
>>>>>
>>>>> On 13 Sep 2016, at 19:31, DuyHai Doan <doanduyhai@gmail.com> wrote:
>>>>>
>>>>> CREATE CUSTOM INDEX ON test.escape(val) USING '
>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'mode':
>>>>> 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sa
>>>>> si.analyzer.NonTokenizingAnalyzer', 'case_sensitive': 'false'};
>>>>>
>>>>> I don't see any problem in the results you got
>>>>>
>>>>> SELECT * FROM escape WHERE val LIKE '%%esc%'; --> Give all results
>>>>> *containing* '%esc' so *%esc*apeme is a possible match and also escape
>>>>> *%esc*
>>>>>
>>>>> Why ‘containing’? I expect that it should be ’starting’..
>>>>>
>>>>>
>>>>> SELECT * FROM escape WHERE val LIKE 'escape%%' --> Give all results
>>>>> *starting* with 'escape%' so *escape%*me is a valid result and also
>>>>> *escape%*esc
>>>>>
>>>>> Why ’starting’? I expect that it should be ‘exact matching’.
>>>>>
>>>>>
>>>>> On Tue, Sep 13, 2016 at 5:58 PM, Mikhail Krupitskiy <
>>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>>
>>>>>> Thanks for the reply.
>>>>>> Could you please provide what index definition did you use?
>>>>>> With the index from my script I get the following results:
>>>>>>
>>>>>> cqlsh:test> select * from escape;
>>>>>>
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>>   2 | escape%me
>>>>>> *  3 | escape%esc*
>>>>>>
>>>>>> Contains search
>>>>>>
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>>
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>>   3
>>>>>> * | escape%esc*(2 rows)
>>>>>>
>>>>>>
>>>>>> Prefix search
>>>>>>
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>>
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   2 | escape%me
>>>>>>   3
>>>>>> * | escape%esc*
>>>>>>
>>>>>> Thanks,
>>>>>> Mikhail
>>>>>>
>>>>>> On 13 Sep 2016, at 18:16, DuyHai Doan <doanduyhai@gmail.com>
wrote:
>>>>>>
>>>>>> Use % to escape %
>>>>>>
>>>>>> cqlsh:test> select * from escape;
>>>>>>
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>>   2 | escape%me
>>>>>>
>>>>>>
>>>>>> Contains search
>>>>>>
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE '%%esc%';
>>>>>>
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   1 | %escapeme
>>>>>>
>>>>>> (1 rows)
>>>>>>
>>>>>>
>>>>>> Prefix search
>>>>>>
>>>>>> cqlsh:test> SELECT * FROM escape WHERE val LIKE 'escape%%';
>>>>>>
>>>>>>  id | val
>>>>>> ----+-----------
>>>>>>   2 | escape%me
>>>>>>
>>>>>> On Tue, Sep 13, 2016 at 5:06 PM, Mikhail Krupitskiy <
>>>>>> mikhail.krupitskiy@jetbrains.com> wrote:
>>>>>>
>>>>>>> Hi Cassandra guys,
>>>>>>>
>>>>>>> I use Cassandra 3.7 and wondering how to use ‘%’ as a simple
char in
>>>>>>> a search pattern.
>>>>>>> Here is my test script:
>>>>>>>
>>>>>>> DROP keyspace if exists kmv;
>>>>>>> CREATE keyspace if not exists kmv WITH REPLICATION = { 'class'
:
>>>>>>> 'SimpleStrategy', 'replication_factor':'1'} ;
>>>>>>> USE kmv;
>>>>>>> CREATE TABLE if not exists kmv (id int, c1 text, c2 text, PRIMARY
>>>>>>> KEY(id, c1));
>>>>>>> CREATE CUSTOM INDEX ON kmv.kmv  ( c2 ) USING '
>>>>>>> org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {
>>>>>>> 'analyzed' : 'true',
>>>>>>> 'analyzer_class' : 'org.apache.cassandra.index.sa
>>>>>>> si.analyzer.NonTokenizingAnalyzer',
>>>>>>> 'case_sensitive' : 'false',
>>>>>>> 'mode' : 'CONTAINS'
>>>>>>> };
>>>>>>>
>>>>>>> INSERT into kmv (id, c1, c2) values (1, 'f22', 'qwe%asd');
>>>>>>> INSERT into kmv (id, c1, c2) values (2, 'f22', '%asd');
>>>>>>> INSERT into kmv (id, c1, c2) values (3, 'f22', 'asd%');
>>>>>>> INSERT into kmv (id, c1, c2) values (4, 'f22', 'asd%1');
>>>>>>> INSERT into kmv (id, c1, c2) values (5, 'f22', 'qweasd');
>>>>>>>
>>>>>>> SELECT c2 from kmv.kmv where c2 like ‘_pattern_';
>>>>>>>
>>>>>>> _pattern_ '%%%' finds all columns that contain %.
>>>>>>> How to find columns that start form ‘%’ or ‘%a’?
>>>>>>> How to find columns that end with ‘%’?
>>>>>>> What about more complex patterns: '%qwe%a%sd%’? How to differentiate
>>>>>>> ‘%’ char form % as a command symbol? (Also there is a related
issue
>>>>>>> CASSANDRA-12573).
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Mikhail
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>>
>
>

Mime
View raw message