Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id BF7452009F4 for ; Thu, 26 May 2016 19:29:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id BE332160A18; Thu, 26 May 2016 17:29:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B54DA160A17 for ; Thu, 26 May 2016 19:29:04 +0200 (CEST) Received: (qmail 20961 invoked by uid 500); 26 May 2016 17:29:03 -0000 Mailing-List: contact dev-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list dev@nifi.apache.org Received: (qmail 20949 invoked by uid 99); 26 May 2016 17:29:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 May 2016 17:29:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id EE5C31A0809 for ; Thu, 26 May 2016 17:29:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id zSUQC7G4fIQ2 for ; Thu, 26 May 2016 17:28:59 +0000 (UTC) Received: from mail-yw0-f174.google.com (mail-yw0-f174.google.com [209.85.161.174]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 84EE25F46D for ; Thu, 26 May 2016 17:28:58 +0000 (UTC) Received: by mail-yw0-f174.google.com with SMTP id c127so83104316ywb.1 for ; Thu, 26 May 2016 10:28:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=lWRTFT07j9fAsaFlwnZwq8QCe9Cqg1+j+4hI96kUwr8=; b=eGTRurzIM1W/aCjj+tGh1tCQ1wUDvSXMxocZjyfXcUI+3wNZO31t4fYhdDC5BHf4gI R2r0fkQqu6dPA7DDD6fNf6sOI0C7iGUIqAoJ2j19kKwldhoTZ71XgiIKjnwxSEsOyHEH RCi0z7iQmsYvxqFP4CH/39aAFcNL6nODkroGDk5Y5Fl+JVH5DDypM9DaIdiPA8mPNO36 H55j6A+8VaRcUPJgv50/CGo4vedgFVRISWrsJkgLTGRXL07ajK72fcqmfvbkXLZVAAsH R25VTQ4M2hfQOE5LjqUUDVtC03CFHMYwOfBerafl1mqJ2bR5ErwviZS7lQWEBcngZRYk NJEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=lWRTFT07j9fAsaFlwnZwq8QCe9Cqg1+j+4hI96kUwr8=; b=QitEeCbEs1cW0xrtCMBFui4s9koOxdEmu1PFhiyetCm+KOG2HCfI68SmLOeiZIjdWR UqH7wSpScAv/LEBPZFfi/RQgYQM8WRgSxbjEcwLO8FTeasAjp2ai1uVlB6jvUvmD1faY l4kjYLNkxC9dbRoTUWVJAItTY23+ZHXwNds9oS1atES7oveh7AajUjpQZE+Z+k/3mcA4 LNcZtnX035uNl8jy8lucHDuEdWultVgXOhuF4LL7AbF6wy86Ctq3mS83vOF7Ma4G8ypJ GYlDkL5k6784yHc4xYgNSegZSEw4z4QD0KnuQAJGgrS4fU7uuLj6RA/I6nxXvQukrF98 RYOw== X-Gm-Message-State: ALyK8tIHP6UftH+tWF1S3g+qxMw666sI1ij/afk77xnNtG6gZemdBXQ76KVw/XD9sdEpwn5rn3dXKg3l8mqzrQ== X-Received: by 10.37.2.72 with SMTP id 69mr3426958ybc.120.1464283737592; Thu, 26 May 2016 10:28:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.129.113.84 with HTTP; Thu, 26 May 2016 10:28:18 -0700 (PDT) In-Reply-To: <485EE4CD-F0CE-4FBB-B5BC-CBB803EF937F@apache.org> References: <1755015209.627803.1464217582480.JavaMail.yahoo@mail.yahoo.com> <662F3086-80C9-448A-ADF2-D6F700387A62@apache.org> <485EE4CD-F0CE-4FBB-B5BC-CBB803EF937F@apache.org> From: Joe Skora Date: Thu, 26 May 2016 13:28:18 -0400 Message-ID: Subject: Re: [discuss] Proposed addition of replaceRegex to expression language To: dev@nifi.apache.org Content-Type: multipart/alternative; boundary=001a113d4876079e9c0533c220b9 archived-at: Thu, 26 May 2016 17:29:05 -0000 --001a113d4876079e9c0533c220b9 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Got it. That makes sense and provides all the variations of functionality so it sounds like a win! On Thu, May 26, 2016 at 1:21 PM, Andy LoPresto wrote= : > Hi Joe, > > I think what you proposed is more than is necessary. replaceAll is > functionally complete already, as if it is passed a literal expression, i= t > will compile it as such and replace all instances in the subject string. = I > think the confusing point is that the difference between replace and > replaceAll is not the number of times the replacement is attempted, but > rather the literal vs. regular expression compilation. As Joe Percivall > suggested, I think three methods (replace(literal), > replaceFirst(regex/literal), and replaceAll(regex/literal)) are sufficien= t. > > Having the replaceRegex and replaceLiteral functions with an additional > parameter to control occurrence count would be difficult, as we would > either need to expose some kind of enum or perform string matching to > interpret user-entered flag values. > > I don=E2=80=99t think replace and replaceAll need to be EOL-ed, I think t= he > documentation just needs to call out the behavior very explicitly, and I > believe having a new option replaceFirst will also help to suggest the > differences between the methods even if users do not read the entire > documentation. > > > Andy LoPresto > alopresto@apache.org > *alopresto.apache@gmail.com * > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On May 26, 2016, at 10:09 AM, Joe Skora wrote: > > I think there is transitional path that is non-breaking and could prevent > 0.x to 1.x migration issues, even if at the expense of a little extra cod= e > for a period of time. > > 1. Leave the existing "replace" and "replaceAll" functions as is for > now. > 2. Implement the new "replaceRegex" and "replaceLiteral" functions. I'= d > prefer giving them a parameter selecting first, last, or all occurrence= s > over function variants, it could even be optional with all as the > default. > 3. In a future release deprecate the "replace" and "replaceAll" > functions. > 4. In a future, future release remove the "replace" and "replaceAll" > > functions. > > From past experience managing enterprise systems, users preferred a > transition period like this instead of requiring everything be updated at > once. We could then provide them a means to find any outstanding use of > the old functionality and clean it up before it was retired. And from th= e > system development and management perspective, that saved us a lot of pai= n > too since we didn't have an influx of minor but troublesome issues in the > midst transitions that could demand our attention be on other bigger > problems. > > It would be possible to have a signle function that takes another paramet= er > indicating whether the target is a literal or regular expression, but I > like the separate functions, especially if they will be implemented with > different underlying APIs. > > On Wed, May 25, 2016 at 10:43 PM, Andy LoPresto > wrote: > > Joe, > > These would be breaking changes and a lot of existing workflows would > begin to behave differently. I would suggest making an incremental change > here =E2=80=94 simply adding replaceFirst as a non-destructive change as = a solution > for this issue, and opening a new Jira for the changes which break backwa= rd > compatibility. > > I do agree that option 1 is probably the cleaner way forward, and if we > introduce new method names, we may be able to use StandardFlowSynchronize= r > to detect legacy methods from a pre-1.0 flow and update them automaticall= y > during flow migration. > > Andy LoPresto > alopresto@apache.org > *alopresto.apache@gmail.com * > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > On May 25, 2016, at 4:06 PM, Joe Percivall > > wrote: > > Andy, > > Nice write-up and thanks for bringing attention to this. I definitely > assumed for a while that replace vs replaceAll was the number of things > replaced. The underlying problem, I think, is that these EL methods are > just wrappers around the Java String methods and the Java String methods > are named in a confusing manner. > > I am on board with adding a "replaceFirst(regex, replacement)" method. > This adds a bit more functionality and is just exposing another Java Stri= ng > method. > > In addition to that, I would suggest doing something to alleviate the > confusion between "replace" and "replaceAll". In a similar fashion to > adding decimal support, I see two avenues we could take: > > 1. Change the names of the functions to "replaceLiteral" and > "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex") > 2. Remove one of the methods and add a third field to the remaining metho= d > to indicate whether to replace literally or interpret as a regex > > Both of these would be breaking changes and introduced with 1.0 and I am > leaning towards option 1 with the base name "replace". I believe when the > "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would > be easy to understand that they replace all occurrences. > > Joe > > - - - - - - > Joseph Percivall > linkedin.com/in/Percivall > e: joepercivall@yahoo.com > > > > On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto > wrote: > > > > Hi all, > > During investigation of an expression language issue posted to the list, = I > discovered that replace explicitly delegates to a String#replace invocati= on > that only accepts literal expressions, not regular expressions, while > replaceAll accepts regular expressions. I thought this was an oversight a= nd > filed NIFI-1919 [1] to document and fix this, by changing the > ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular > expressions. I wrote failing unit tests [3] to capture the fix. After > implementing the change, two existing unit tests [4] broke, which indicat= ed > a regression. At first, I believed these two tests to be incorrect, but > further investigation showed they were merely surprising. > > TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call= , > but the test is asserting that replace should replace multiple instances = of > the single quote. While this is similar to String#replace, because the > expression language exposes only two methods =E2=80=94 replace vs. replac= eAll =E2=80=94 one > could easily assume the difference between the two was the number of > attempted replacements, rather than the actual difference, which is liter= al > expression vs. pattern. > > @Test > public void testQuotingQuotes() { > final Map attributes =3D new HashMap<>(); > attributes.put("xx", "say 'hi'"); > > String query =3D "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}"; > verifyEquals(query, attributes, "say \"hello\""); > > query =3D "${xx:replace( \"'\", '\"')}"; > verifyEquals(query, attributes, "say \"hi\""); > > query =3D "${xx:replace( '\\'', '\"')}"; > System.out.println(query); > verifyEquals(query, attributes, "say \"hi\""); > } > TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails > on the first verifyEquals call with a PatternSyntaxException. I am > investigating that further. > > @Test > public void testReplaceAllWithOddNumberOfBackslashPairs() { > final Map attributes =3D new HashMap<>(); > attributes.put("filename", "C:\\temp\\.txt"); > > verifyEquals("${filename:replace('\\\\', '/')}", attributes, > "C:/temp/.txt"); > verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes, > "C:/temp/.txt"); > verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes, > "C:\\temp"); > } > While I originally had just modified replace, after looking at the EL > documentation [5], replace is explicitly documented to only replace liter= al > expressions, and does so multiple times, as does Java=E2=80=99s String#re= place [6]. > I now propose to add another method replaceFirst, which accepts a pattern > and replaces only the first occurrence. I will update the unit tests to > properly capture this, and will update the documentation to reflect the n= ew > method. > > Thoughts from the community? > > [1] https://issues.apache.org/jira/browse/NIFI-1919 > [2] > > https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expres= sion-language/src/main/java/org/apache/nifi/attribute/expression/language/e= valuation/functions/ReplaceEvaluator.java > [3] > > https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expres= sion-language/src/test/groovy/org/apache/nifi/attribute/expression/language= /QueryGroovyTest.groovy > [4] > > https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expres= sion-language/src/test/java/org/apache/nifi/attribute/expression/language/T= estQuery.java > [5] > > https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.htm= l#replace > [6] > > https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(j= ava.lang.CharSequence,%20java.lang.CharSequence) > > > > Andy LoPresto > alopresto@apache.org > alopresto.apache@gmail.com > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > > --001a113d4876079e9c0533c220b9--