asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Function name change: contains() -> string-contains()
Date Fri, 16 Sep 2016 22:36:43 GMT
+2  :-)


On 9/16/16 2:06 PM, Yingyi Bu wrote:
> Cool, +1!
>
> Best,
> Yingyi
>
> On Fri, Sep 16, 2016 at 1:54 PM, Taewoo Kim <wangsaeu@gmail.com> wrote:
>
>> So, in summary, we agree to use a function format for the full-text search,
>> rather than using XQuery syntax. "contains" doesn't have to be
>> "string-contains" and "text" doesn't have to be a reserved word.
>>
>> The possible syntax would be:
>>
>> *ftcontains*(expression1, expression2, parameter record expression)
>> *matches*(expression1, expression2, parameter record expression)
>>
>> Expression1 is the field that we conduct a full-text search.
>> Expression2 contains the number of keywords that will be searched on
>> Expression1.
>> Parameter Record Expression contains the parameters in a record format.
>>
>> An example could be: ftcontains($o.title, ["hello","hi"], {"mode":"all"})
>> which checks whether $o.title contains both "hello" and "hi".
>>
>> Chen mentioned that how to pass parameter needs a separate discussion.
>> However, for now, parameters in a  record is a viable solution unless we
>> want to separate each parameter as a parameter to the function itself. It
>> would be harder to remember the position of each parameter.
>>
>>
>>
>>
>>
>>
>> Best,
>> Taewoo
>>
>> On Fri, Sep 16, 2016 at 10:12 AM, Heri Ramampiaro <heriram@gmail.com>
>> wrote:
>>
>>> +1
>>>
>>> -heri
>>>
>>>> On Sep 15, 2016, at 19:01, Chen Li <chenli@gmail.com> wrote:
>>>>
>>>> For full-text search, I like "ftcontains()" since it's very intuitive.
>>>>
>>>> Syntax for advanced full-text features such as stop words, analyzers,
>> and
>>>> languages need a separate discussion.
>>>>
>>>> Chen
>>>>
>>>> On Thu, Sep 15, 2016 at 5:58 PM, Taewoo Kim <wangsaeu@gmail.com>
>> wrote:
>>>>> @Till: I see. Thanks for the suggestion. It's more clearer now.
>>>>>
>>>>> Best,
>>>>> Taewoo
>>>>>
>>>>> On Thu, Sep 15, 2016 at 5:58 PM, Till Westmann <tillw@apache.org>
>>> wrote:
>>>>>> And as it turns out, we already have some infrastructure to
>> translate a
>>>>>> constant record constructor expression into a record in
>>>>>> LangRecordParseUtil.
>>>>>> So supporting that wouldn’t be too painful.
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>>
>>>>>> On 15 Sep 2016, at 17:41, Till Westmann wrote:
>>>>>>
>>>>>> One option to express those parameters, would be to pass in a
>> (compile
>>>>> time
>>>>>>> constant) record/object. E.g.
>>>>>>>
>>>>>>>     where ftcontains($o.title, ["hello","hi"],
>>>>>>>                      { "combine": "and", "stop list": "default"
})
>>>>>>>
>>>>>>> That way we could have named optional parameters (please ignore
the
>>>>>>> ugliness of
>>>>>>> my chosen parameters) which avoid the problem of dealing with
>>> positions.
>>>>>>> We do have a nested datamodel, so we could put it to good use
here
>> :)
>>>>>>> Does this make sense?
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Till
>>>>>>>
>>>>>>> On 15 Sep 2016, at 16:26, Taewoo Kim wrote:
>>>>>>>
>>>>>>> @Till: we can add whether the given search is AND/OR search,
stop
>> list
>>>>>>>> and/or stemming method. For example, if we use ftcontains(),
then
>> it
>>>>>>>> might
>>>>>>>> look like:
>>>>>>>>
>>>>>>>> 1) where ftcontains($o.title, "hello"): find $o where the
title
>> field
>>>>>>>> contains hello.
>>>>>>>> 2) where ftcontains($o.title, ["hello","hi"], any): find
$o where
>> the
>>>>>>>> title
>>>>>>>> field contains hello *and/or* hi.
>>>>>>>> 3) where ftcontains($o.title, ["hello","hi"], all): find
$o where
>> the
>>>>>>>> title
>>>>>>>> field contains both hello *and* hi.
>>>>>>>> 4) where ftcontains($o.title, ["hello","hi"], all,
>> defaultstoplist):
>>>>> find
>>>>>>>> $o where the title field contains both hello *and* hi. Also
apply
>> the
>>>>>>>> default stoplist to the search. The default stop list contains
the
>>>>> number
>>>>>>>> of English common words that can be filtered.
>>>>>>>>
>>>>>>>> The issue here is that the position of each parameter should
be
>>>>> observed
>>>>>>>> (e.g., the third one indicates whether we do
>> disjunctive/conjunctive
>>>>>>>> search. The fourth one tells us which stop list we use).
So, if we
>>> have
>>>>>>>> three parameters, how to specify/omit these becomes a challenge.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Taewoo
>>>>>>>>
>>>>>>>> On Thu, Sep 15, 2016 at 4:12 PM, Till Westmann <tillw@apache.org>
>>>>> wrote:
>>>>>>>> Makes sense to me (especially as I always think about this
specific
>>> one
>>>>>>>>> as
>>>>>>>>> "ftcontains" :) ).
>>>>>>>>>
>>>>>>>>> Another thing you mentioned is about the parameters that
will get
>>>>> added
>>>>>>>>> in
>>>>>>>>> the
>>>>>>>>> future. Could you provide an example for this?
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Till
>>>>>>>>>
>>>>>>>>> On 15 Sep 2016, at 15:37, Taewoo Kim wrote:
>>>>>>>>>
>>>>>>>>> Maybe we could come up with a function form - *ftcontains*().
>> Here,
>>> ft
>>>>>>>>> is
>>>>>>>>>
>>>>>>>>>> an abbreviation for full-text. This function replaces
"contains
>>> text"
>>>>>>>>>> in
>>>>>>>>>> XQuery spec. An example might be:
>>>>>>>>>>
>>>>>>>>>> XQuery spec: where $o.titile contains text "hello"
>>>>>>>>>> AQL: where ftcontains($o.title, "hello")
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Taewoo
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 15, 2016 at 3:18 PM, Taewoo Kim <wangsaeu@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> @Till: Got it. I agree to your opinion. The issue
here for the
>>>>>>>>>> full-text
>>>>>>>>>>
>>>>>>>>>>> search is that many function parameters that
controls the
>> behavior
>>>>> of
>>>>>>>>>>> full-text search will be added in the future.
Maybe this is not
>>> the
>>>>>>>>>>> issue?
>>>>>>>>>>> :-)
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Taewoo
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 15, 2016 at 3:11 PM, Till Westmann
<
>> tillw@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>> I think that our challenge here is, that
XQuery is very liberal
>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>> introduction of new keywords, as the grammar
is keyword free.
>>>>>>>>>>>> However,
>>>>>>>>>>>> they
>>>>>>>>>>>> often use combinations of words "contain"
"text" to
>> disambiguate.
>>>>>>>>>>>> AQL on the other had is not keyword free
and so each time we
>>>>>>>>>>>> introduce a
>>>>>>>>>>>> new
>>>>>>>>>>>> one, we create a backwards compatibility
problem. It seems that
>>> for
>>>>>>>>>>>> AQL
>>>>>>>>>>>> using a
>>>>>>>>>>>> function-based syntax would create fewer
problems.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Till
>>>>>>>>>>>>
>>>>>>>>>>>> On 2 Mar 2016, at 18:25, Taewoo Kim wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hello All,
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to suggest a current function
name change. I am
>>>>>>>>>>>>> currently
>>>>>>>>>>>>> working on Full Text Search features.
XQuery Full-text search
>>> spec
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>> states that for a full-text search, the
syntax is *RangeExpr (
>>>>>>>>>>>>> "contains"
>>>>>>>>>>>>> "text" FTSelection FTIgnoreOption? )?*.
As you see, we are
>> going
>>>>> to
>>>>>>>>>>>>> use
>>>>>>>>>>>>> "contains text something". And we already
have contains()
>>> function
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> that
>>>>>>>>>>>>> does a substring match.  So, in order
to remove possible
>>>>> ambiguities
>>>>>>>>>>>>> between two features, *contains()* will
be renamed to
>>>>>>>>>>>>> *string-contains()*
>>>>>>>>>>>>> when I merge my index-only branch to
the master if there is no
>>>>>>>>>>>>> strong
>>>>>>>>>>>>> opinion on this. Thank you. I will send
another note as my
>> merge
>>>>>>>>>>>>> progresses. Thank you.
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1] https://www.w3.org/TR/xpath-full-text-10/#doc-xquery10-
>>> FTCon
>>>>>>>>>>>>> tainsExpr
>>>>>>>>>>>>>
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/si
>>>>>>>>>>>>> te/asterix-doc/aql/functions.html#StringFunctions
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Taewoo
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>
>>>>>>
>>>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message