asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taewoo Kim <wangs...@gmail.com>
Subject Re: Function name change: contains() -> string-contains()
Date Fri, 16 Sep 2016 04:15:27 GMT
If we choose ftcontains() or matches(), then I think we may keep contains()
as it is. And since we choose a function form, 'text' doesn't have to be a
reserved keyword. Then, at this stage, we don't need to change anything to
integrate the full-text search feature.

One remaining issue as Chen said is how to support many parameters for a
full-text search. I think Till suggested a good option (parameters as a
record).

Best,
Taewoo

On Thu, Sep 15, 2016 at 7:01 PM, Chen Li <chenli@gmail.com> wrote:

> For full-text search, I like "ftcontains()" since it's very intuitive.
>
> Syntax for advanced full-text features such as stop words, analyzers, and
> languages need a separate discussion.
>
> Chen
>
> On Thu, Sep 15, 2016 at 5:58 PM, Taewoo Kim <wangsaeu@gmail.com> wrote:
>
> > @Till: I see. Thanks for the suggestion. It's more clearer now.
> >
> > Best,
> > Taewoo
> >
> > On Thu, Sep 15, 2016 at 5:58 PM, Till Westmann <tillw@apache.org> wrote:
> >
> > > And as it turns out, we already have some infrastructure to translate a
> > > constant record constructor expression into a record in
> > > LangRecordParseUtil.
> > > So supporting that wouldn’t be too painful.
> > >
> > > Cheers,
> > > Till
> > >
> > >
> > > On 15 Sep 2016, at 17:41, Till Westmann wrote:
> > >
> > > One option to express those parameters, would be to pass in a (compile
> > time
> > >> constant) record/object. E.g.
> > >>
> > >>     where ftcontains($o.title, ["hello","hi"],
> > >>                      { "combine": "and", "stop list": "default" })
> > >>
> > >> That way we could have named optional parameters (please ignore the
> > >> ugliness of
> > >> my chosen parameters) which avoid the problem of dealing with
> positions.
> > >> We do have a nested datamodel, so we could put it to good use here :)
> > >>
> > >> Does this make sense?
> > >>
> > >> Cheers,
> > >> Till
> > >>
> > >> On 15 Sep 2016, at 16:26, Taewoo Kim wrote:
> > >>
> > >> @Till: we can add whether the given search is AND/OR search, stop list
> > >>> and/or stemming method. For example, if we use ftcontains(), then it
> > >>> might
> > >>> look like:
> > >>>
> > >>> 1) where ftcontains($o.title, "hello"): find $o where the title field
> > >>> contains hello.
> > >>> 2) where ftcontains($o.title, ["hello","hi"], any): find $o where the
> > >>> title
> > >>> field contains hello *and/or* hi.
> > >>> 3) where ftcontains($o.title, ["hello","hi"], all): find $o where the
> > >>> title
> > >>> field contains both hello *and* hi.
> > >>> 4) where ftcontains($o.title, ["hello","hi"], all, defaultstoplist):
> > find
> > >>> $o where the title field contains both hello *and* hi. Also apply the
> > >>> default stoplist to the search. The default stop list contains the
> > number
> > >>> of English common words that can be filtered.
> > >>>
> > >>> The issue here is that the position of each parameter should be
> > observed
> > >>> (e.g., the third one indicates whether we do disjunctive/conjunctive
> > >>> search. The fourth one tells us which stop list we use). So, if we
> have
> > >>> three parameters, how to specify/omit these becomes a challenge.
> > >>>
> > >>> Best,
> > >>> Taewoo
> > >>>
> > >>> On Thu, Sep 15, 2016 at 4:12 PM, Till Westmann <tillw@apache.org>
> > wrote:
> > >>>
> > >>> Makes sense to me (especially as I always think about this specific
> one
> > >>>> as
> > >>>> "ftcontains" :) ).
> > >>>>
> > >>>> Another thing you mentioned is about the parameters that will get
> > added
> > >>>> in
> > >>>> the
> > >>>> future. Could you provide an example for this?
> > >>>>
> > >>>> Cheers,
> > >>>> Till
> > >>>>
> > >>>> On 15 Sep 2016, at 15:37, Taewoo Kim wrote:
> > >>>>
> > >>>> Maybe we could come up with a function form - *ftcontains*(). Here,
> ft
> > >>>> is
> > >>>>
> > >>>>>
> > >>>>> an abbreviation for full-text. This function replaces "contains
> text"
> > >>>>> in
> > >>>>> XQuery spec. An example might be:
> > >>>>>
> > >>>>> XQuery spec: where $o.titile contains text "hello"
> > >>>>> AQL: where ftcontains($o.title, "hello")
> > >>>>>
> > >>>>> Best,
> > >>>>> Taewoo
> > >>>>>
> > >>>>> On Thu, Sep 15, 2016 at 3:18 PM, Taewoo Kim <wangsaeu@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>> @Till: Got it. I agree to your opinion. The issue here for
the
> > >>>>> full-text
> > >>>>>
> > >>>>>> search is that many function parameters that controls the
behavior
> > of
> > >>>>>> full-text search will be added in the future. Maybe this
is not
> the
> > >>>>>> issue?
> > >>>>>> :-)
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Taewoo
> > >>>>>>
> > >>>>>> On Thu, Sep 15, 2016 at 3:11 PM, Till Westmann <tillw@apache.org>
> > >>>>>> wrote:
> > >>>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>>>
> > >>>>>>> I think that our challenge here is, that XQuery is
very liberal
> in
> > >>>>>>> the
> > >>>>>>> introduction of new keywords, as the grammar is keyword
free.
> > >>>>>>> However,
> > >>>>>>> they
> > >>>>>>> often use combinations of words "contain" "text" to
disambiguate.
> > >>>>>>> AQL on the other had is not keyword free and so each
time we
> > >>>>>>> introduce a
> > >>>>>>> new
> > >>>>>>> one, we create a backwards compatibility problem. It
seems that
> for
> > >>>>>>> AQL
> > >>>>>>> using a
> > >>>>>>> function-based syntax would create fewer problems.
> > >>>>>>>
> > >>>>>>> Cheers,
> > >>>>>>> Till
> > >>>>>>>
> > >>>>>>> On 2 Mar 2016, at 18:25, Taewoo Kim wrote:
> > >>>>>>>
> > >>>>>>> Hello All,
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> I would like to suggest a current function name
change. I am
> > >>>>>>>> currently
> > >>>>>>>> working on Full Text Search features. XQuery Full-text
search
> spec
> > >>>>>>>> [1]
> > >>>>>>>> states that for a full-text search, the syntax
is *RangeExpr (
> > >>>>>>>> "contains"
> > >>>>>>>> "text" FTSelection FTIgnoreOption? )?*. As you
see, we are going
> > to
> > >>>>>>>> use
> > >>>>>>>> "contains text something". And we already have
contains()
> function
> > >>>>>>>> [2]
> > >>>>>>>> that
> > >>>>>>>> does a substring match.  So, in order to remove
possible
> > ambiguities
> > >>>>>>>> between two features, *contains()* will be renamed
to
> > >>>>>>>> *string-contains()*
> > >>>>>>>> when I merge my index-only branch to the master
if there is no
> > >>>>>>>> strong
> > >>>>>>>> opinion on this. Thank you. I will send another
note as my merge
> > >>>>>>>> progresses. Thank you.
> > >>>>>>>>
> > >>>>>>>> [1] https://www.w3.org/TR/xpath-full-text-10/#doc-xquery10-
> FTCon
> > >>>>>>>> tainsExpr
> > >>>>>>>>
> > >>>>>>>> [2]
> > >>>>>>>> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/si
> > >>>>>>>> te/asterix-doc/aql/functions.html#StringFunctions
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Taewoo
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message