nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Skora <jsk...@gmail.com>
Subject Re: [discuss] Proposed addition of replaceRegex to expression language
Date Thu, 26 May 2016 17:28:18 GMT
Got it.  That makes sense and provides all the variations of functionality
so it sounds like a win!

On Thu, May 26, 2016 at 1:21 PM, Andy LoPresto <alopresto@apache.org> wrote:

> Hi Joe,
>
> I think what you proposed is more than is necessary. replaceAll is
> functionally complete already, as if it is passed a literal expression, it
> will compile it as such and replace all instances in the subject string. I
> think the confusing point is that the difference between replace and
> replaceAll is not the number of times the replacement is attempted, but
> rather the literal vs. regular expression compilation. As Joe Percivall
> suggested, I think three methods (replace(literal),
> replaceFirst(regex/literal), and replaceAll(regex/literal)) are sufficient.
>
> Having the replaceRegex and replaceLiteral functions with an additional
> parameter to control occurrence count would be difficult, as we would
> either need to expose some kind of enum or perform string matching to
> interpret user-entered flag values.
>
> I don’t think replace and replaceAll need to be EOL-ed, I think the
> documentation just needs to call out the behavior very explicitly, and I
> believe having a new option replaceFirst will also help to suggest the
> differences between the methods even if users do not read the entire
> documentation.
>
>
> Andy LoPresto
> alopresto@apache.org
> *alopresto.apache@gmail.com <alopresto.apache@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On May 26, 2016, at 10:09 AM, Joe Skora <jskora@gmail.com> wrote:
>
> I think there is transitional path that is non-breaking and could prevent
> 0.x to 1.x migration issues, even if at the expense of a little extra code
> for a period of time.
>
>   1. Leave the existing "replace" and "replaceAll" functions as is for
>   now.
>   2. Implement the new "replaceRegex" and "replaceLiteral" functions.  I'd
>   prefer giving them a parameter selecting first, last, or all occurrences
>   over function variants, it could even be optional with all as the
> default.
>   3. In a future release deprecate the "replace" and "replaceAll"
>   functions.
>   4. In a future, future release remove the "replace" and "replaceAll"
>
>   functions.
>
> From past experience managing enterprise systems, users preferred a
> transition period like this instead of requiring everything be updated at
> once.  We could then provide them a means to find any outstanding use of
> the old functionality and clean it up before it was retired.  And from the
> system development and management perspective, that saved us a lot of pain
> too since we didn't have an influx of minor but troublesome issues in the
> midst transitions that could demand our attention be on other bigger
> problems.
>
> It would be possible to have a signle function that takes another parameter
> indicating whether the target is a literal or regular expression, but I
> like the separate functions, especially if they will be implemented with
> different underlying APIs.
>
> On Wed, May 25, 2016 at 10:43 PM, Andy LoPresto <alopresto@apache.org>
> wrote:
>
> Joe,
>
> These would be breaking changes and a lot of existing workflows would
> begin to behave differently. I would suggest making an incremental change
> here — simply adding replaceFirst as a non-destructive change as a solution
> for this issue, and opening a new Jira for the changes which break backward
> compatibility.
>
> I do agree that option 1 is probably the cleaner way forward, and if we
> introduce new method names, we may be able to use StandardFlowSynchronizer
> to detect legacy methods from a pre-1.0 flow and update them automatically
> during flow migration.
>
> Andy LoPresto
> alopresto@apache.org
> *alopresto.apache@gmail.com <alopresto.apache@gmail.com>*
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
> On May 25, 2016, at 4:06 PM, Joe Percivall <joepercivall@yahoo.com.INVALID
> <joepercivall@yahoo.com.invalid>
> <joepercivall@yahoo.com.invalid>> wrote:
>
> Andy,
>
> Nice write-up and thanks for bringing attention to this. I definitely
> assumed for a while that replace vs replaceAll was the number of things
> replaced. The underlying problem, I think, is that these EL methods are
> just wrappers around the Java String methods and the Java String methods
> are named in a confusing manner.
>
> I am on board with adding a "replaceFirst(regex, replacement)" method.
> This adds a bit more functionality and is just exposing another Java String
> method.
>
> In addition to that, I would suggest doing something to alleviate the
> confusion between "replace" and "replaceAll". In a similar fashion to
> adding decimal support, I see two avenues we could take:
>
> 1. Change the names of the functions to "replaceLiteral" and
> "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex")
> 2. Remove one of the methods and add a third field to the remaining method
> to indicate whether to replace literally or interpret as a regex
>
> Both of these would be breaking changes and introduced with 1.0 and I am
> leaning towards option 1 with the base name "replace". I believe when the
> "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would
> be easy to understand that they replace all occurrences.
>
> Joe
>
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: joepercivall@yahoo.com
>
>
>
> On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <alopresto@apache.org>
> wrote:
>
>
>
> Hi all,
>
> During investigation of an expression language issue posted to the list, I
> discovered that replace explicitly delegates to a String#replace invocation
> that only accepts literal expressions, not regular expressions, while
> replaceAll accepts regular expressions. I thought this was an oversight and
> filed NIFI-1919 [1] to document and fix this, by changing the
> ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular
> expressions. I wrote failing unit tests [3] to capture the fix. After
> implementing the change, two existing unit tests [4] broke, which indicated
> a regression. At first, I believed these two tests to be incorrect, but
> further investigation showed they were merely surprising.
>
> TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call,
> but the test is asserting that replace should replace multiple instances of
> the single quote. While this is similar to String#replace, because the
> expression language exposes only two methods — replace vs. replaceAll — one
> could easily assume the difference between the two was the number of
> attempted replacements, rather than the actual difference, which is literal
> expression vs. pattern.
>
> @Test
> public void testQuotingQuotes() {
> final Map<String, String> attributes = new HashMap<>();
> attributes.put("xx", "say 'hi'");
>
> String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}";
> verifyEquals(query, attributes, "say \"hello\"");
>
> query = "${xx:replace( \"'\", '\"')}";
> verifyEquals(query, attributes, "say \"hi\"");
>
> query = "${xx:replace( '\\'', '\"')}";
> System.out.println(query);
> verifyEquals(query, attributes, "say \"hi\"");
> }
> TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails
> on the first verifyEquals call with a PatternSyntaxException. I am
> investigating that further.
>
> @Test
> public void testReplaceAllWithOddNumberOfBackslashPairs() {
> final Map<String, String> attributes = new HashMap<>();
> attributes.put("filename", "C:\\temp\\.txt");
>
> verifyEquals("${filename:replace('\\\\', '/')}", attributes,
> "C:/temp/.txt");
> verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes,
> "C:/temp/.txt");
> verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes,
> "C:\\temp");
> }
> While I originally had just modified replace, after looking at the EL
> documentation [5], replace is explicitly documented to only replace literal
> expressions, and does so multiple times, as does Java’s String#replace [6].
> I now propose to add another method replaceFirst, which accepts a pattern
> and replaces only the first occurrence. I will update the unit tests to
> properly capture this, and will update the documentation to reflect the new
> method.
>
> Thoughts from the community?
>
> [1] https://issues.apache.org/jira/browse/NIFI-1919
> [2]
>
> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java
> [3]
>
> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy
> [4]
>
> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java
> [5]
>
> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace
> [6]
>
> https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)
>
>
>
> Andy LoPresto
> alopresto@apache.org
> alopresto.apache@gmail.com
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message