asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Till Westmann" <ti...@apache.org>
Subject Re: Function name change: contains() -> string-contains()
Date Fri, 16 Sep 2016 21:05:04 GMT
+1

On 16 Sep 2016, at 13:54, Taewoo Kim wrote:

> So, in summary, we agree to use a function format for the full-text 
> search,
> rather than using XQuery syntax. "contains" doesn't have to be
> "string-contains" and "text" doesn't have to be a reserved word.
>
> The possible syntax would be:
>
> *ftcontains*(expression1, expression2, parameter record expression)
> *matches*(expression1, expression2, parameter record expression)
>
> Expression1 is the field that we conduct a full-text search.
> Expression2 contains the number of keywords that will be searched on
> Expression1.
> Parameter Record Expression contains the parameters in a record 
> format.
>
> An example could be: ftcontains($o.title, ["hello","hi"], 
> {"mode":"all"})
> which checks whether $o.title contains both "hello" and "hi".
>
> Chen mentioned that how to pass parameter needs a separate discussion.
> However, for now, parameters in a  record is a viable solution unless 
> we
> want to separate each parameter as a parameter to the function itself. 
> It
> would be harder to remember the position of each parameter.
>
> Best,
> Taewoo
>
> On Fri, Sep 16, 2016 at 10:12 AM, Heri Ramampiaro <heriram@gmail.com> 
> wrote:
>
>> +1
>>
>> -heri
>>
>>> On Sep 15, 2016, at 19:01, Chen Li <chenli@gmail.com> wrote:
>>>
>>> For full-text search, I like "ftcontains()" since it's very 
>>> intuitive.
>>>
>>> Syntax for advanced full-text features such as stop words, 
>>> analyzers, and
>>> languages need a separate discussion.
>>>
>>> Chen
>>>
>>> On Thu, Sep 15, 2016 at 5:58 PM, Taewoo Kim <wangsaeu@gmail.com> 
>>> wrote:
>>>
>>>> @Till: I see. Thanks for the suggestion. It's more clearer now.
>>>>
>>>> Best,
>>>> Taewoo
>>>>
>>>> On Thu, Sep 15, 2016 at 5:58 PM, Till Westmann <tillw@apache.org>
>> wrote:
>>>>
>>>>> And as it turns out, we already have some infrastructure to 
>>>>> translate a
>>>>> constant record constructor expression into a record in
>>>>> LangRecordParseUtil.
>>>>> So supporting that wouldn’t be too painful.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>>
>>>>> On 15 Sep 2016, at 17:41, Till Westmann wrote:
>>>>>
>>>>> One option to express those parameters, would be to pass in a 
>>>>> (compile
>>>> time
>>>>>> constant) record/object. E.g.
>>>>>>
>>>>>>    where ftcontains($o.title, ["hello","hi"],
>>>>>>                     { "combine": "and", "stop list": "default" })
>>>>>>
>>>>>> That way we could have named optional parameters (please ignore 
>>>>>> the
>>>>>> ugliness of
>>>>>> my chosen parameters) which avoid the problem of dealing with
>> positions.
>>>>>> We do have a nested datamodel, so we could put it to good use 
>>>>>> here :)
>>>>>>
>>>>>> Does this make sense?
>>>>>>
>>>>>> Cheers,
>>>>>> Till
>>>>>>
>>>>>> On 15 Sep 2016, at 16:26, Taewoo Kim wrote:
>>>>>>
>>>>>> @Till: we can add whether the given search is AND/OR search, stop

>>>>>> list
>>>>>>> and/or stemming method. For example, if we use ftcontains(),

>>>>>>> then it
>>>>>>> might
>>>>>>> look like:
>>>>>>>
>>>>>>> 1) where ftcontains($o.title, "hello"): find $o where the title

>>>>>>> field
>>>>>>> contains hello.
>>>>>>> 2) where ftcontains($o.title, ["hello","hi"], any): find $o 
>>>>>>> where the
>>>>>>> title
>>>>>>> field contains hello *and/or* hi.
>>>>>>> 3) where ftcontains($o.title, ["hello","hi"], all): find $o 
>>>>>>> where the
>>>>>>> title
>>>>>>> field contains both hello *and* hi.
>>>>>>> 4) where ftcontains($o.title, ["hello","hi"], all, 
>>>>>>> defaultstoplist):
>>>> find
>>>>>>> $o where the title field contains both hello *and* hi. Also 
>>>>>>> apply the
>>>>>>> default stoplist to the search. The default stop list contains

>>>>>>> the
>>>> number
>>>>>>> of English common words that can be filtered.
>>>>>>>
>>>>>>> The issue here is that the position of each parameter should
be
>>>> observed
>>>>>>> (e.g., the third one indicates whether we do 
>>>>>>> disjunctive/conjunctive
>>>>>>> search. The fourth one tells us which stop list we use). So,
if 
>>>>>>> we
>> have
>>>>>>> three parameters, how to specify/omit these becomes a challenge.
>>>>>>>
>>>>>>> Best,
>>>>>>> Taewoo
>>>>>>>
>>>>>>> On Thu, Sep 15, 2016 at 4:12 PM, Till Westmann 
>>>>>>> <tillw@apache.org>
>>>> wrote:
>>>>>>>
>>>>>>> Makes sense to me (especially as I always think about this 
>>>>>>> specific
>> one
>>>>>>>> as
>>>>>>>> "ftcontains" :) ).
>>>>>>>>
>>>>>>>> Another thing you mentioned is about the parameters that
will 
>>>>>>>> get
>>>> added
>>>>>>>> in
>>>>>>>> the
>>>>>>>> future. Could you provide an example for this?
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Till
>>>>>>>>
>>>>>>>> On 15 Sep 2016, at 15:37, Taewoo Kim wrote:
>>>>>>>>
>>>>>>>> Maybe we could come up with a function form - *ftcontains*().

>>>>>>>> Here,
>> ft
>>>>>>>> is
>>>>>>>>
>>>>>>>>>
>>>>>>>>> an abbreviation for full-text. This function replaces

>>>>>>>>> "contains
>> text"
>>>>>>>>> in
>>>>>>>>> XQuery spec. An example might be:
>>>>>>>>>
>>>>>>>>> XQuery spec: where $o.titile contains text "hello"
>>>>>>>>> AQL: where ftcontains($o.title, "hello")
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Taewoo
>>>>>>>>>
>>>>>>>>> On Thu, Sep 15, 2016 at 3:18 PM, Taewoo Kim 
>>>>>>>>> <wangsaeu@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> @Till: Got it. I agree to your opinion. The issue here
for the
>>>>>>>>> full-text
>>>>>>>>>
>>>>>>>>>> search is that many function parameters that controls
the 
>>>>>>>>>> behavior
>>>> of
>>>>>>>>>> full-text search will be added in the future. Maybe
this is 
>>>>>>>>>> not
>> the
>>>>>>>>>> issue?
>>>>>>>>>> :-)
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Taewoo
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 15, 2016 at 3:11 PM, Till Westmann 
>>>>>>>>>> <tillw@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I think that our challenge here is, that XQuery
is very 
>>>>>>>>>>> liberal
>> in
>>>>>>>>>>> the
>>>>>>>>>>> introduction of new keywords, as the grammar
is keyword 
>>>>>>>>>>> free.
>>>>>>>>>>> However,
>>>>>>>>>>> they
>>>>>>>>>>> often use combinations of words "contain" "text"
to 
>>>>>>>>>>> disambiguate.
>>>>>>>>>>> AQL on the other had is not keyword free and
so each time we
>>>>>>>>>>> introduce a
>>>>>>>>>>> new
>>>>>>>>>>> one, we create a backwards compatibility problem.
It seems 
>>>>>>>>>>> that
>> for
>>>>>>>>>>> AQL
>>>>>>>>>>> using a
>>>>>>>>>>> function-based syntax would create fewer problems.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Till
>>>>>>>>>>>
>>>>>>>>>>> On 2 Mar 2016, at 18:25, Taewoo Kim wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hello All,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I would like to suggest a current function
name change. I 
>>>>>>>>>>>> am
>>>>>>>>>>>> currently
>>>>>>>>>>>> working on Full Text Search features. XQuery
Full-text 
>>>>>>>>>>>> search
>> spec
>>>>>>>>>>>> [1]
>>>>>>>>>>>> states that for a full-text search, the syntax
is 
>>>>>>>>>>>> *RangeExpr (
>>>>>>>>>>>> "contains"
>>>>>>>>>>>> "text" FTSelection FTIgnoreOption? )?*. As
you see, we are 
>>>>>>>>>>>> going
>>>> to
>>>>>>>>>>>> use
>>>>>>>>>>>> "contains text something". And we already
have contains()
>> function
>>>>>>>>>>>> [2]
>>>>>>>>>>>> that
>>>>>>>>>>>> does a substring match.  So, in order to
remove possible
>>>> ambiguities
>>>>>>>>>>>> between two features, *contains()* will be
renamed to
>>>>>>>>>>>> *string-contains()*
>>>>>>>>>>>> when I merge my index-only branch to the
master if there is 
>>>>>>>>>>>> no
>>>>>>>>>>>> strong
>>>>>>>>>>>> opinion on this. Thank you. I will send another
note as my 
>>>>>>>>>>>> merge
>>>>>>>>>>>> progresses. Thank you.
>>>>>>>>>>>>
>>>>>>>>>>>> [1] https://www.w3.org/TR/xpath-full-text-10/#doc-xquery10-
>> FTCon
>>>>>>>>>>>> tainsExpr
>>>>>>>>>>>>
>>>>>>>>>>>> [2]
>>>>>>>>>>>> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/si
>>>>>>>>>>>> te/asterix-doc/aql/functions.html#StringFunctions
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Taewoo
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>
>>

Mime
View raw message