asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Li <che...@gmail.com>
Subject Re: Function name change: contains() -> string-contains()
Date Fri, 16 Sep 2016 02:01:20 GMT
For full-text search, I like "ftcontains()" since it's very intuitive.

Syntax for advanced full-text features such as stop words, analyzers, and
languages need a separate discussion.

Chen

On Thu, Sep 15, 2016 at 5:58 PM, Taewoo Kim <wangsaeu@gmail.com> wrote:

> @Till: I see. Thanks for the suggestion. It's more clearer now.
>
> Best,
> Taewoo
>
> On Thu, Sep 15, 2016 at 5:58 PM, Till Westmann <tillw@apache.org> wrote:
>
> > And as it turns out, we already have some infrastructure to translate a
> > constant record constructor expression into a record in
> > LangRecordParseUtil.
> > So supporting that wouldn’t be too painful.
> >
> > Cheers,
> > Till
> >
> >
> > On 15 Sep 2016, at 17:41, Till Westmann wrote:
> >
> > One option to express those parameters, would be to pass in a (compile
> time
> >> constant) record/object. E.g.
> >>
> >>     where ftcontains($o.title, ["hello","hi"],
> >>                      { "combine": "and", "stop list": "default" })
> >>
> >> That way we could have named optional parameters (please ignore the
> >> ugliness of
> >> my chosen parameters) which avoid the problem of dealing with positions.
> >> We do have a nested datamodel, so we could put it to good use here :)
> >>
> >> Does this make sense?
> >>
> >> Cheers,
> >> Till
> >>
> >> On 15 Sep 2016, at 16:26, Taewoo Kim wrote:
> >>
> >> @Till: we can add whether the given search is AND/OR search, stop list
> >>> and/or stemming method. For example, if we use ftcontains(), then it
> >>> might
> >>> look like:
> >>>
> >>> 1) where ftcontains($o.title, "hello"): find $o where the title field
> >>> contains hello.
> >>> 2) where ftcontains($o.title, ["hello","hi"], any): find $o where the
> >>> title
> >>> field contains hello *and/or* hi.
> >>> 3) where ftcontains($o.title, ["hello","hi"], all): find $o where the
> >>> title
> >>> field contains both hello *and* hi.
> >>> 4) where ftcontains($o.title, ["hello","hi"], all, defaultstoplist):
> find
> >>> $o where the title field contains both hello *and* hi. Also apply the
> >>> default stoplist to the search. The default stop list contains the
> number
> >>> of English common words that can be filtered.
> >>>
> >>> The issue here is that the position of each parameter should be
> observed
> >>> (e.g., the third one indicates whether we do disjunctive/conjunctive
> >>> search. The fourth one tells us which stop list we use). So, if we have
> >>> three parameters, how to specify/omit these becomes a challenge.
> >>>
> >>> Best,
> >>> Taewoo
> >>>
> >>> On Thu, Sep 15, 2016 at 4:12 PM, Till Westmann <tillw@apache.org>
> wrote:
> >>>
> >>> Makes sense to me (especially as I always think about this specific one
> >>>> as
> >>>> "ftcontains" :) ).
> >>>>
> >>>> Another thing you mentioned is about the parameters that will get
> added
> >>>> in
> >>>> the
> >>>> future. Could you provide an example for this?
> >>>>
> >>>> Cheers,
> >>>> Till
> >>>>
> >>>> On 15 Sep 2016, at 15:37, Taewoo Kim wrote:
> >>>>
> >>>> Maybe we could come up with a function form - *ftcontains*(). Here,
ft
> >>>> is
> >>>>
> >>>>>
> >>>>> an abbreviation for full-text. This function replaces "contains
text"
> >>>>> in
> >>>>> XQuery spec. An example might be:
> >>>>>
> >>>>> XQuery spec: where $o.titile contains text "hello"
> >>>>> AQL: where ftcontains($o.title, "hello")
> >>>>>
> >>>>> Best,
> >>>>> Taewoo
> >>>>>
> >>>>> On Thu, Sep 15, 2016 at 3:18 PM, Taewoo Kim <wangsaeu@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>> @Till: Got it. I agree to your opinion. The issue here for the
> >>>>> full-text
> >>>>>
> >>>>>> search is that many function parameters that controls the behavior
> of
> >>>>>> full-text search will be added in the future. Maybe this is
not the
> >>>>>> issue?
> >>>>>> :-)
> >>>>>>
> >>>>>> Best,
> >>>>>> Taewoo
> >>>>>>
> >>>>>> On Thu, Sep 15, 2016 at 3:11 PM, Till Westmann <tillw@apache.org>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>>>
> >>>>>>> I think that our challenge here is, that XQuery is very
liberal in
> >>>>>>> the
> >>>>>>> introduction of new keywords, as the grammar is keyword
free.
> >>>>>>> However,
> >>>>>>> they
> >>>>>>> often use combinations of words "contain" "text" to disambiguate.
> >>>>>>> AQL on the other had is not keyword free and so each time
we
> >>>>>>> introduce a
> >>>>>>> new
> >>>>>>> one, we create a backwards compatibility problem. It seems
that for
> >>>>>>> AQL
> >>>>>>> using a
> >>>>>>> function-based syntax would create fewer problems.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Till
> >>>>>>>
> >>>>>>> On 2 Mar 2016, at 18:25, Taewoo Kim wrote:
> >>>>>>>
> >>>>>>> Hello All,
> >>>>>>>
> >>>>>>>
> >>>>>>>> I would like to suggest a current function name change.
I am
> >>>>>>>> currently
> >>>>>>>> working on Full Text Search features. XQuery Full-text
search spec
> >>>>>>>> [1]
> >>>>>>>> states that for a full-text search, the syntax is *RangeExpr
(
> >>>>>>>> "contains"
> >>>>>>>> "text" FTSelection FTIgnoreOption? )?*. As you see,
we are going
> to
> >>>>>>>> use
> >>>>>>>> "contains text something". And we already have contains()
function
> >>>>>>>> [2]
> >>>>>>>> that
> >>>>>>>> does a substring match.  So, in order to remove possible
> ambiguities
> >>>>>>>> between two features, *contains()* will be renamed to
> >>>>>>>> *string-contains()*
> >>>>>>>> when I merge my index-only branch to the master if there
is no
> >>>>>>>> strong
> >>>>>>>> opinion on this. Thank you. I will send another note
as my merge
> >>>>>>>> progresses. Thank you.
> >>>>>>>>
> >>>>>>>> [1] https://www.w3.org/TR/xpath-full-text-10/#doc-xquery10-FTCon
> >>>>>>>> tainsExpr
> >>>>>>>>
> >>>>>>>> [2]
> >>>>>>>> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/si
> >>>>>>>> te/asterix-doc/aql/functions.html#StringFunctions
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Taewoo
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message