Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 10442200B82 for ; Fri, 16 Sep 2016 23:06:20 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0EEB0160AC4; Fri, 16 Sep 2016 21:06:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 32AF8160AB7 for ; Fri, 16 Sep 2016 23:06:19 +0200 (CEST) Received: (qmail 38531 invoked by uid 500); 16 Sep 2016 21:06:18 -0000 Mailing-List: contact dev-help@asterixdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.apache.org Delivered-To: mailing list dev@asterixdb.apache.org Received: (qmail 38513 invoked by uid 99); 16 Sep 2016 21:06:18 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Sep 2016 21:06:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9DE03C5549 for ; Fri, 16 Sep 2016 21:06:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id hR1NrLa_RGSo for ; Fri, 16 Sep 2016 21:06:14 +0000 (UTC) Received: from mail-yb0-f182.google.com (mail-yb0-f182.google.com [209.85.213.182]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id C6EC35F47C for ; Fri, 16 Sep 2016 21:06:13 +0000 (UTC) Received: by mail-yb0-f182.google.com with SMTP id d69so56917193ybf.2 for ; Fri, 16 Sep 2016 14:06:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=gQXdmvCrPpcCYUG+JWLs87bT0AsrdUU/embYZgEL7QM=; b=DEvuKsWbfUhYPSahUnLNsyBKnDL/yrTG4DJmYHLfDQZJUz57o0CsKdz4wYX7DSTciY QN2PVCXvZbAYhPvrrMW5A+7ZTbd661RDsZkhJmx9HVe/QCPuebdLKIunXP0YO3r0E00V P8tT4LyGq3/44ylTighyxDKSbnMQYtK0NAejRswNrQQDrtCW1leAA4uaAd542n6Ypn+O dW0AbOsMlPLdKx+IMn7pLyxeUESa4r9g68kuyg8JA3ZXSRCufijuImI0nEvEM4QKbErJ cmaiBZZTWWsbhRLPLS5UgZdy3lNoCtjhviQzujXOKxLX/kYVSiltidl6MYEnMcG87cTA Uxnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=gQXdmvCrPpcCYUG+JWLs87bT0AsrdUU/embYZgEL7QM=; b=b/HvjYbhLDWL1cBGKNdmY5mgdA5Oi6R0h9R/ptMdFP275g4GioGPdC+yYXWPBdXiGO YfOw/N9ZarZ6m18Vauf8/lq533KddI/RSpZl2hrS3ib2vd3TIOrBQcRGJW4skZa1JtVq HgYLbfZkNonFLivoXd6jL+VLkCl+cICEy0+MW46iCprwZ/yjh+TiL8ZqCLdtHv7XjXLh fTvQ2QtCirs6BdjbjW8CO9qeywkOz/19vjxYcDjD+WwNJeNfomI1Ljb+hM+4v7fvklDx gsBDTUP1YhPp5uKD+9B22uW6FPbuaH0tb5b9As6ATG6IwsNO1OXyg94OYei7s35UDEAR Y+vg== X-Gm-Message-State: AE9vXwPtWPB692Bsq856d+L4w/cDzhwTMvRHPIS/g3ftzXek/MDo6G3flpfw4+11kq2cM/wEFPn6zaadJQfQUg== X-Received: by 10.37.5.134 with SMTP id 128mr2230083ybf.33.1474059966937; Fri, 16 Sep 2016 14:06:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.37.170.171 with HTTP; Fri, 16 Sep 2016 14:06:06 -0700 (PDT) In-Reply-To: References: <2286CB8E-2AF4-4247-B745-55615E69CF0A@apache.org> <399FF084-CDFA-47D4-8077-49D3B2D77B58@apache.org> <3668B4B7-0F65-44BF-835A-84D552394B30@apache.org> <474F8506-FA94-48CE-9E16-EFB0C8FEE5B1@apache.org> <0C8B3005-E6FE-45B9-8BF0-6B19DDA8CAE5@gmail.com> From: Yingyi Bu Date: Fri, 16 Sep 2016 14:06:06 -0700 Message-ID: Subject: Re: Function name change: contains() -> string-contains() To: dev@asterixdb.apache.org Content-Type: multipart/alternative; boundary=001a11c05470b519dd053ca65412 archived-at: Fri, 16 Sep 2016 21:06:20 -0000 --001a11c05470b519dd053ca65412 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cool, +1! Best, Yingyi On Fri, Sep 16, 2016 at 1:54 PM, Taewoo Kim wrote: > So, in summary, we agree to use a function format for the full-text searc= h, > rather than using XQuery syntax. "contains" doesn't have to be > "string-contains" and "text" doesn't have to be a reserved word. > > The possible syntax would be: > > *ftcontains*(expression1, expression2, parameter record expression) > *matches*(expression1, expression2, parameter record expression) > > Expression1 is the field that we conduct a full-text search. > Expression2 contains the number of keywords that will be searched on > Expression1. > Parameter Record Expression contains the parameters in a record format. > > An example could be: ftcontains($o.title, ["hello","hi"], {"mode":"all"}) > which checks whether $o.title contains both "hello" and "hi". > > Chen mentioned that how to pass parameter needs a separate discussion. > However, for now, parameters in a record is a viable solution unless we > want to separate each parameter as a parameter to the function itself. It > would be harder to remember the position of each parameter. > > > > > > > Best, > Taewoo > > On Fri, Sep 16, 2016 at 10:12 AM, Heri Ramampiaro > wrote: > > > +1 > > > > -heri > > > > > On Sep 15, 2016, at 19:01, Chen Li wrote: > > > > > > For full-text search, I like "ftcontains()" since it's very intuitive= . > > > > > > Syntax for advanced full-text features such as stop words, analyzers, > and > > > languages need a separate discussion. > > > > > > Chen > > > > > > On Thu, Sep 15, 2016 at 5:58 PM, Taewoo Kim > wrote: > > > > > >> @Till: I see. Thanks for the suggestion. It's more clearer now. > > >> > > >> Best, > > >> Taewoo > > >> > > >> On Thu, Sep 15, 2016 at 5:58 PM, Till Westmann > > wrote: > > >> > > >>> And as it turns out, we already have some infrastructure to > translate a > > >>> constant record constructor expression into a record in > > >>> LangRecordParseUtil. > > >>> So supporting that wouldn=E2=80=99t be too painful. > > >>> > > >>> Cheers, > > >>> Till > > >>> > > >>> > > >>> On 15 Sep 2016, at 17:41, Till Westmann wrote: > > >>> > > >>> One option to express those parameters, would be to pass in a > (compile > > >> time > > >>>> constant) record/object. E.g. > > >>>> > > >>>> where ftcontains($o.title, ["hello","hi"], > > >>>> { "combine": "and", "stop list": "default" }) > > >>>> > > >>>> That way we could have named optional parameters (please ignore th= e > > >>>> ugliness of > > >>>> my chosen parameters) which avoid the problem of dealing with > > positions. > > >>>> We do have a nested datamodel, so we could put it to good use here > :) > > >>>> > > >>>> Does this make sense? > > >>>> > > >>>> Cheers, > > >>>> Till > > >>>> > > >>>> On 15 Sep 2016, at 16:26, Taewoo Kim wrote: > > >>>> > > >>>> @Till: we can add whether the given search is AND/OR search, stop > list > > >>>>> and/or stemming method. For example, if we use ftcontains(), then > it > > >>>>> might > > >>>>> look like: > > >>>>> > > >>>>> 1) where ftcontains($o.title, "hello"): find $o where the title > field > > >>>>> contains hello. > > >>>>> 2) where ftcontains($o.title, ["hello","hi"], any): find $o where > the > > >>>>> title > > >>>>> field contains hello *and/or* hi. > > >>>>> 3) where ftcontains($o.title, ["hello","hi"], all): find $o where > the > > >>>>> title > > >>>>> field contains both hello *and* hi. > > >>>>> 4) where ftcontains($o.title, ["hello","hi"], all, > defaultstoplist): > > >> find > > >>>>> $o where the title field contains both hello *and* hi. Also apply > the > > >>>>> default stoplist to the search. The default stop list contains th= e > > >> number > > >>>>> of English common words that can be filtered. > > >>>>> > > >>>>> The issue here is that the position of each parameter should be > > >> observed > > >>>>> (e.g., the third one indicates whether we do > disjunctive/conjunctive > > >>>>> search. The fourth one tells us which stop list we use). So, if w= e > > have > > >>>>> three parameters, how to specify/omit these becomes a challenge. > > >>>>> > > >>>>> Best, > > >>>>> Taewoo > > >>>>> > > >>>>> On Thu, Sep 15, 2016 at 4:12 PM, Till Westmann > > >> wrote: > > >>>>> > > >>>>> Makes sense to me (especially as I always think about this specif= ic > > one > > >>>>>> as > > >>>>>> "ftcontains" :) ). > > >>>>>> > > >>>>>> Another thing you mentioned is about the parameters that will ge= t > > >> added > > >>>>>> in > > >>>>>> the > > >>>>>> future. Could you provide an example for this? > > >>>>>> > > >>>>>> Cheers, > > >>>>>> Till > > >>>>>> > > >>>>>> On 15 Sep 2016, at 15:37, Taewoo Kim wrote: > > >>>>>> > > >>>>>> Maybe we could come up with a function form - *ftcontains*(). > Here, > > ft > > >>>>>> is > > >>>>>> > > >>>>>>> > > >>>>>>> an abbreviation for full-text. This function replaces "contains > > text" > > >>>>>>> in > > >>>>>>> XQuery spec. An example might be: > > >>>>>>> > > >>>>>>> XQuery spec: where $o.titile contains text "hello" > > >>>>>>> AQL: where ftcontains($o.title, "hello") > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> Taewoo > > >>>>>>> > > >>>>>>> On Thu, Sep 15, 2016 at 3:18 PM, Taewoo Kim > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>> @Till: Got it. I agree to your opinion. The issue here for the > > >>>>>>> full-text > > >>>>>>> > > >>>>>>>> search is that many function parameters that controls the > behavior > > >> of > > >>>>>>>> full-text search will be added in the future. Maybe this is no= t > > the > > >>>>>>>> issue? > > >>>>>>>> :-) > > >>>>>>>> > > >>>>>>>> Best, > > >>>>>>>> Taewoo > > >>>>>>>> > > >>>>>>>> On Thu, Sep 15, 2016 at 3:11 PM, Till Westmann < > tillw@apache.org> > > >>>>>>>> wrote: > > >>>>>>>> > > >>>>>>>> Hi, > > >>>>>>>> > > >>>>>>>>> > > >>>>>>>>> I think that our challenge here is, that XQuery is very liber= al > > in > > >>>>>>>>> the > > >>>>>>>>> introduction of new keywords, as the grammar is keyword free. > > >>>>>>>>> However, > > >>>>>>>>> they > > >>>>>>>>> often use combinations of words "contain" "text" to > disambiguate. > > >>>>>>>>> AQL on the other had is not keyword free and so each time we > > >>>>>>>>> introduce a > > >>>>>>>>> new > > >>>>>>>>> one, we create a backwards compatibility problem. It seems th= at > > for > > >>>>>>>>> AQL > > >>>>>>>>> using a > > >>>>>>>>> function-based syntax would create fewer problems. > > >>>>>>>>> > > >>>>>>>>> Cheers, > > >>>>>>>>> Till > > >>>>>>>>> > > >>>>>>>>> On 2 Mar 2016, at 18:25, Taewoo Kim wrote: > > >>>>>>>>> > > >>>>>>>>> Hello All, > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> I would like to suggest a current function name change. I am > > >>>>>>>>>> currently > > >>>>>>>>>> working on Full Text Search features. XQuery Full-text searc= h > > spec > > >>>>>>>>>> [1] > > >>>>>>>>>> states that for a full-text search, the syntax is *RangeExpr= ( > > >>>>>>>>>> "contains" > > >>>>>>>>>> "text" FTSelection FTIgnoreOption? )?*. As you see, we are > going > > >> to > > >>>>>>>>>> use > > >>>>>>>>>> "contains text something". And we already have contains() > > function > > >>>>>>>>>> [2] > > >>>>>>>>>> that > > >>>>>>>>>> does a substring match. So, in order to remove possible > > >> ambiguities > > >>>>>>>>>> between two features, *contains()* will be renamed to > > >>>>>>>>>> *string-contains()* > > >>>>>>>>>> when I merge my index-only branch to the master if there is = no > > >>>>>>>>>> strong > > >>>>>>>>>> opinion on this. Thank you. I will send another note as my > merge > > >>>>>>>>>> progresses. Thank you. > > >>>>>>>>>> > > >>>>>>>>>> [1] https://www.w3.org/TR/xpath-full-text-10/#doc-xquery10- > > FTCon > > >>>>>>>>>> tainsExpr > > >>>>>>>>>> > > >>>>>>>>>> [2] > > >>>>>>>>>> https://asterix-jenkins.ics.uci.edu/job/asterix-test-full/si > > >>>>>>>>>> te/asterix-doc/aql/functions.html#StringFunctions > > >>>>>>>>>> > > >>>>>>>>>> Best, > > >>>>>>>>>> Taewoo > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>> > > >>> > > >>> > > >> > > > > > --001a11c05470b519dd053ca65412--