asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taewoo Kim <wangs...@gmail.com>
Subject Re: Function name change: contains() -> string-contains()
Date Fri, 16 Sep 2016 20:54:54 GMT
So, in summary, we agree to use a function format for the full-text search,
rather than using XQuery syntax. "contains" doesn't have to be
"string-contains" and "text" doesn't have to be a reserved word.

The possible syntax would be:

*ftcontains*(expression1, expression2, parameter record expression)
*matches*(expression1, expression2, parameter record expression)

Expression1 is the field that we conduct a full-text search.
Expression2 contains the number of keywords that will be searched on
Expression1.
Parameter Record Expression contains the parameters in a record format.

An example could be: ftcontains($o.title, ["hello","hi"], {"mode":"all"})
which checks whether $o.title contains both "hello" and "hi".

Chen mentioned that how to pass parameter needs a separate discussion.
However, for now, parameters in a  record is a viable solution unless we
want to separate each parameter as a parameter to the function itself. It
would be harder to remember the position of each parameter.






Best,
Taewoo

On Fri, Sep 16, 2016 at 10:12 AM, Heri Ramampiaro <heriram@gmail.com> wrote:

> +1
>
> -heri
>
> > On Sep 15, 2016, at 19:01, Chen Li <chenli@gmail.com> wrote:
> >
> > For full-text search, I like "ftcontains()" since it's very intuitive.
> >
> > Syntax for advanced full-text features such as stop words, analyzers, and
> > languages need a separate discussion.
> >
> > Chen
> >
> > On Thu, Sep 15, 2016 at 5:58 PM, Taewoo Kim <wangsaeu@gmail.com> wrote:
> >
> >> @Till: I see. Thanks for the suggestion. It's more clearer now.
> >>
> >> Best,
> >> Taewoo
> >>
> >> On Thu, Sep 15, 2016 at 5:58 PM, Till Westmann <tillw@apache.org>
> wrote:
> >>
> >>> And as it turns out, we already have some infrastructure to translate a
> >>> constant record constructor expression into a record in
> >>> LangRecordParseUtil.
> >>> So supporting that wouldn’t be too painful.
> >>>
> >>> Cheers,
> >>> Till
> >>>
> >>>
> >>> On 15 Sep 2016, at 17:41, Till Westmann wrote:
> >>>
> >>> One option to express those parameters, would be to pass in a (compile
> >> time
> >>>> constant) record/object. E.g.
> >>>>
> >>>>    where ftcontains($o.title, ["hello","hi"],
> >>>>                     { "combine": "and", "stop list": "default" })
> >>>>
> >>>> That way we could have named optional parameters (please ignore the
> >>>> ugliness of
> >>>> my chosen parameters) which avoid the problem of dealing with
> positions.
> >>>> We do have a nested datamodel, so we could put it to good use here :)
> >>>>
> >>>> Does this make sense?
> >>>>
> >>>> Cheers,
> >>>> Till
> >>>>
> >>>> On 15 Sep 2016, at 16:26, Taewoo Kim wrote:
> >>>>
> >>>> @Till: we can add whether the given search is AND/OR search, stop list
> >>>>> and/or stemming method. For example, if we use ftcontains(), then
it
> >>>>> might
> >>>>> look like:
> >>>>>
> >>>>> 1) where ftcontains($o.title, "hello"): find $o where the title
field
> >>>>> contains hello.
> >>>>> 2) where ftcontains($o.title, ["hello","hi"], any): find $o where
the
> >>>>> title
> >>>>> field contains hello *and/or* hi.
> >>>>> 3) where ftcontains($o.title, ["hello","hi"], all): find $o where
the
> >>>>> title
> >>>>> field contains both hello *and* hi.
> >>>>> 4) where ftcontains($o.title, ["hello","hi"], all, defaultstoplist):
> >> find
> >>>>> $o where the title field contains both hello *and* hi. Also apply
the
> >>>>> default stoplist to the search. The default stop list contains the
> >> number
> >>>>> of English common words that can be filtered.
> >>>>>
> >>>>> The issue here is that the position of each parameter should be
> >> observed
> >>>>> (e.g., the third one indicates whether we do disjunctive/conjunctive
> >>>>> search. The fourth one tells us which stop list we use). So, if
we
> have
> >>>>> three parameters, how to specify/omit these becomes a challenge.
> >>>>>
> >>>>> Best,
> >>>>> Taewoo
> >>>>>
> >>>>> On Thu, Sep 15, 2016 at 4:12 PM, Till Westmann <tillw@apache.org>
> >> wrote:
> >>>>>
> >>>>> Makes sense to me (especially as I always think about this specific
> one
> >>>>>> as
> >>>>>> "ftcontains" :) ).
> >>>>>>
> >>>>>> Another thing you mentioned is about the parameters that will
get
> >> added
> >>>>>> in
> >>>>>> the
> >>>>>> future. Could you provide an example for this?
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Till
> >>>>>>
> >>>>>> On 15 Sep 2016, at 15:37, Taewoo Kim wrote:
> >>>>>>
> >>>>>> Maybe we could come up with a function form - *ftcontains*().
Here,
> ft
> >>>>>> is
> >>>>>>
> >>>>>>>
> >>>>>>> an abbreviation for full-text. This function replaces "contains
> text"
> >>>>>>> in
> >>>>>>> XQuery spec. An example might be:
> >>>>>>>
> >>>>>>> XQuery spec: where $o.titile contains text "hello"
> >>>>>>> AQL: where ftcontains($o.title, "hello")
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Taewoo
> >>>>>>>
> >>>>>>> On Thu, Sep 15, 2016 at 3:18 PM, Taewoo Kim <wangsaeu@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> @Till: Got it. I agree to your opinion. The issue here for
the
> >>>>>>> full-text
> >>>>>>>
> >>>>>>>> search is that many function parameters that controls
the behavior
> >> of
> >>>>>>>> full-text search will be added in the future. Maybe
this is not
> the
> >>>>>>>> issue?
> >>>>>>>> :-)
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Taewoo
> >>>>>>>>
> >>>>>>>> On Thu, Sep 15, 2016 at 3:11 PM, Till Westmann <tillw@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I think that our challenge here is, that XQuery
is very liberal
> in
> >>>>>>>>> the
> >>>>>>>>> introduction of new keywords, as the grammar is
keyword free.
> >>>>>>>>> However,
> >>>>>>>>> they
> >>>>>>>>> often use combinations of words "contain" "text"
to disambiguate.
> >>>>>>>>> AQL on the other had is not keyword free and so
each time we
> >>>>>>>>> introduce a
> >>>>>>>>> new
> >>>>>>>>> one, we create a backwards compatibility problem.
It seems that
> for
> >>>>>>>>> AQL
> >>>>>>>>> using a
> >>>>>>>>> function-based syntax would create fewer problems.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Till
> >>>>>>>>>
> >>>>>>>>> On 2 Mar 2016, at 18:25, Taewoo Kim wrote:
> >>>>>>>>>
> >>>>>>>>> Hello All,
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> I would like to suggest a current function name
change. I am
> >>>>>>>>>> currently
> >>>>>>>>>> working on Full Text Search features. XQuery
Full-text search
> spec
> >>>>>>>>>> [1]
> >>>>>>>>>> states that for a full-text search, the syntax
is *RangeExpr (
> >>>>>>>>>> "contains"
> >>>>>>>>>> "text" FTSelection FTIgnoreOption? )?*. As you
see, we are going
> >> to
> >>>>>>>>>> use
> >>>>>>>>>> "contains text something". And we already have
contains()
> function
> >>>>>>>>>> [2]
> >>>>>>>>>> that
> >>>>>>>>>> does a substring match.  So, in order to remove
possible
> >> ambiguities
> >>>>>>>>>> between two features, *contains()* will be renamed
to
> >>>>>>>>>> *string-contains()*
> >>>>>>>>>> when I merge my index-only branch to the master
if there is no
> >>>>>>>>>> strong
> >>>>>>>>>> opinion on this. Thank you. I will send another
note as my merge
> >>>>>>>>>> progresses. Thank you.
> >>>>>>>>>>
> >>>>>>>>>> [1] https://www.w3.org/TR/xpath-full-text-10/#doc-xquery10-
> FTCon
> >>>>>>>>>> tainsExpr
> >>>>>>>>>>
> >>>>>>>>>> [2]
> >>>>>>>>>> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/si
> >>>>>>>>>> te/asterix-doc/aql/functions.html#StringFunctions
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Taewoo
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>
> >>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message