cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-11067) Improve SASI syntax
Date Thu, 04 Feb 2016 16:17:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-11067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15132517#comment-15132517
] 

Jack Krupansky commented on CASSANDRA-11067:
--------------------------------------------

For reference, over in Solr land users constantly struggle with how to combine exact and partial
matching - sometimes they want an absolute literal match for the full field/column, sometimes
a wildcard on that full field, sometimes keyword tokenization, sometimes wildcard on tokenized
terms, sometimes phrases of tokenized terms, and sometimes phrases from the full literal string.
Unfortunately, Solr doesn't have a direct answer for that, so people are forced to copy the
field (typically a <copyyField>) directive and then one field is the literal string
and the other is the tokenized field. That gives them complete control at query time, so q=name_literal:Joe
would only match when the full name is Joe while q=name_tokenized:joe would match for any
name with joe. Similarly, q=name_lit:Jo* would only match names with Jo as a prefix, while
q=name_tok:jo* would match Joe Smith as well as Bill Johnson.

The user might also opt to copy to yet a third field which is tokenized but with the so-called
keyword tokenizer which permits the string to be normalized but not broken into tokens. The
common case is to lower case, but other common cases would be to eliminate punctuation, replace
certain prefixes and suffixes, or whatever.

The real point there is that "exact" match is still a range of possibilities.

One of the issues here for Cassandra is whether you really want to combine these two separate
exactness semantics that Solr keeps separate.

> Improve SASI syntax
> -------------------
>
>                 Key: CASSANDRA-11067
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11067
>             Project: Cassandra
>          Issue Type: Task
>          Components: CQL
>            Reporter: Jonathan Ellis
>            Assignee: Pavel Yaskevich
>             Fix For: 3.4
>
>
> I think everyone agrees that a LIKE operator would be ideal, but that's probably not
in scope for an initial 3.4 release.
> Still, I'm uncomfortable with the initial approach of overloading = to mean "satisfies
index expression."  The problem is that it will be very difficult to back out of this behavior
once people are using it.
> I propose adding a new operator in the interim instead.  Call it MATCHES, maybe.  With
the exact same behavior that SASI currently exposes, just with a separate operator rather
than being rolled into =.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message