nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy LoPresto <alopre...@apache.org>
Subject Re: [discuss] Proposed addition of replaceRegex to expression language
Date Fri, 27 May 2016 17:42:48 GMT
After some discussion and further thought, leaving “replace” to handle literal Strings
which contain escape characters may be convenient. In this case I would suggest either renaming
“replace” to “replaceAllLiteral” or the other methods to “replaceFirstRegex” and
“replaceAllRegex” to illustrate the differences.


Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On May 27, 2016, at 10:25 AM, Andy LoPresto <alopresto@apache.org> wrote:
> 
> Following up on this, I am not sure replace is needed at all. ReplaceAll, if called with
a “regular expression” that is just a literal String, will perform the same operation
as replace would.
> 
> In NIFI-1919, I will not remove replace, as that would break BC, but I submit that for
1.0, we should remove it, as it’s behavior is completely covered by another method. We would
then have:
> 
> replaceFirst - accepts a literal or regex
> replaceAll - accepts a literal or regex
> 
> Again, we could perform “flow upgrade” through the StandardFlowSynchronizer to migrate
all uses of “replace” to “replaceAll". Thoughts?
> 
> Andy LoPresto
> alopresto@apache.org <mailto:alopresto@apache.org>
> alopresto.apache@gmail.com <mailto:alopresto.apache@gmail.com>
> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 
>> On May 26, 2016, at 10:28 AM, Joe Skora <jskora@gmail.com <mailto:jskora@gmail.com>>
wrote:
>> 
>> Got it.  That makes sense and provides all the variations of functionality
>> so it sounds like a win!
>> 
>> On Thu, May 26, 2016 at 1:21 PM, Andy LoPresto <alopresto@apache.org <mailto:alopresto@apache.org>>
wrote:
>> 
>>> Hi Joe,
>>> 
>>> I think what you proposed is more than is necessary. replaceAll is
>>> functionally complete already, as if it is passed a literal expression, it
>>> will compile it as such and replace all instances in the subject string. I
>>> think the confusing point is that the difference between replace and
>>> replaceAll is not the number of times the replacement is attempted, but
>>> rather the literal vs. regular expression compilation. As Joe Percivall
>>> suggested, I think three methods (replace(literal),
>>> replaceFirst(regex/literal), and replaceAll(regex/literal)) are sufficient.
>>> 
>>> Having the replaceRegex and replaceLiteral functions with an additional
>>> parameter to control occurrence count would be difficult, as we would
>>> either need to expose some kind of enum or perform string matching to
>>> interpret user-entered flag values.
>>> 
>>> I don’t think replace and replaceAll need to be EOL-ed, I think the
>>> documentation just needs to call out the behavior very explicitly, and I
>>> believe having a new option replaceFirst will also help to suggest the
>>> differences between the methods even if users do not read the entire
>>> documentation.
>>> 
>>> 
>>> Andy LoPresto
>>> alopresto@apache.org <mailto:alopresto@apache.org>
>>> *alopresto.apache@gmail.com <mailto:alopresto.apache@gmail.com> <alopresto.apache@gmail.com
<mailto:alopresto.apache@gmail.com>>*
>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>> 
>>> On May 26, 2016, at 10:09 AM, Joe Skora <jskora@gmail.com <mailto:jskora@gmail.com>>
wrote:
>>> 
>>> I think there is transitional path that is non-breaking and could prevent
>>> 0.x to 1.x migration issues, even if at the expense of a little extra code
>>> for a period of time.
>>> 
>>>  1. Leave the existing "replace" and "replaceAll" functions as is for
>>>  now.
>>>  2. Implement the new "replaceRegex" and "replaceLiteral" functions.  I'd
>>>  prefer giving them a parameter selecting first, last, or all occurrences
>>>  over function variants, it could even be optional with all as the
>>> default.
>>>  3. In a future release deprecate the "replace" and "replaceAll"
>>>  functions.
>>>  4. In a future, future release remove the "replace" and "replaceAll"
>>> 
>>>  functions.
>>> 
>>> From past experience managing enterprise systems, users preferred a
>>> transition period like this instead of requiring everything be updated at
>>> once.  We could then provide them a means to find any outstanding use of
>>> the old functionality and clean it up before it was retired.  And from the
>>> system development and management perspective, that saved us a lot of pain
>>> too since we didn't have an influx of minor but troublesome issues in the
>>> midst transitions that could demand our attention be on other bigger
>>> problems.
>>> 
>>> It would be possible to have a signle function that takes another parameter
>>> indicating whether the target is a literal or regular expression, but I
>>> like the separate functions, especially if they will be implemented with
>>> different underlying APIs.
>>> 
>>> On Wed, May 25, 2016 at 10:43 PM, Andy LoPresto <alopresto@apache.org <mailto:alopresto@apache.org>>
>>> wrote:
>>> 
>>> Joe,
>>> 
>>> These would be breaking changes and a lot of existing workflows would
>>> begin to behave differently. I would suggest making an incremental change
>>> here — simply adding replaceFirst as a non-destructive change as a solution
>>> for this issue, and opening a new Jira for the changes which break backward
>>> compatibility.
>>> 
>>> I do agree that option 1 is probably the cleaner way forward, and if we
>>> introduce new method names, we may be able to use StandardFlowSynchronizer
>>> to detect legacy methods from a pre-1.0 flow and update them automatically
>>> during flow migration.
>>> 
>>> Andy LoPresto
>>> alopresto@apache.org <mailto:alopresto@apache.org>
>>> *alopresto.apache@gmail.com <mailto:alopresto.apache@gmail.com> <alopresto.apache@gmail.com
<mailto:alopresto.apache@gmail.com>>*
>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
>>> 
>>> On May 25, 2016, at 4:06 PM, Joe Percivall <joepercivall@yahoo.com.INVALID
<mailto:joepercivall@yahoo.com.invalid>
>>> <joepercivall@yahoo.com.invalid <mailto:joepercivall@yahoo.com.invalid>>
>>> <joepercivall@yahoo.com.invalid <mailto:joepercivall@yahoo.com.invalid>>>
wrote:
>>> 
>>> Andy,
>>> 
>>> Nice write-up and thanks for bringing attention to this. I definitely
>>> assumed for a while that replace vs replaceAll was the number of things
>>> replaced. The underlying problem, I think, is that these EL methods are
>>> just wrappers around the Java String methods and the Java String methods
>>> are named in a confusing manner.
>>> 
>>> I am on board with adding a "replaceFirst(regex, replacement)" method.
>>> This adds a bit more functionality and is just exposing another Java String
>>> method.
>>> 
>>> In addition to that, I would suggest doing something to alleviate the
>>> confusion between "replace" and "replaceAll". In a similar fashion to
>>> adding decimal support, I see two avenues we could take:
>>> 
>>> 1. Change the names of the functions to "replaceLiteral" and
>>> "replaceRegex" (or "replaceAllLiteral" and "replaceAllRegex")
>>> 2. Remove one of the methods and add a third field to the remaining method
>>> to indicate whether to replace literally or interpret as a regex
>>> 
>>> Both of these would be breaking changes and introduced with 1.0 and I am
>>> leaning towards option 1 with the base name "replace". I believe when the
>>> "replaceFirst" method is added, "replaceLiteral" and "replaceRegex" would
>>> be easy to understand that they replace all occurrences.
>>> 
>>> Joe
>>> 
>>> - - - - - -
>>> Joseph Percivall
>>> linkedin.com/in/Percivall <http://linkedin.com/in/Percivall>
>>> e: joepercivall@yahoo.com <mailto:joepercivall@yahoo.com>
>>> 
>>> 
>>> 
>>> On Wednesday, May 25, 2016 6:24 PM, Andy LoPresto <alopresto@apache.org <mailto:alopresto@apache.org>>
>>> wrote:
>>> 
>>> 
>>> 
>>> Hi all,
>>> 
>>> During investigation of an expression language issue posted to the list, I
>>> discovered that replace explicitly delegates to a String#replace invocation
>>> that only accepts literal expressions, not regular expressions, while
>>> replaceAll accepts regular expressions. I thought this was an oversight and
>>> filed NIFI-1919 [1] to document and fix this, by changing the
>>> ReplaceEvaluator [2] to use String#replaceFirst, which accepts regular
>>> expressions. I wrote failing unit tests [3] to capture the fix. After
>>> implementing the change, two existing unit tests [4] broke, which indicated
>>> a regression. At first, I believed these two tests to be incorrect, but
>>> further investigation showed they were merely surprising.
>>> 
>>> TestQuery#testQuotingQuotes (below) fails on the second verifyEquals call,
>>> but the test is asserting that replace should replace multiple instances of
>>> the single quote. While this is similar to String#replace, because the
>>> expression language exposes only two methods — replace vs. replaceAll — one
>>> could easily assume the difference between the two was the number of
>>> attempted replacements, rather than the actual difference, which is literal
>>> expression vs. pattern.
>>> 
>>> @Test
>>> public void testQuotingQuotes() {
>>> final Map<String, String> attributes = new HashMap<>();
>>> attributes.put("xx", "say 'hi'");
>>> 
>>> String query = "${xx:replaceAll( \"'.*'\", '\\\"hello\\\"' )}";
>>> verifyEquals(query, attributes, "say \"hello\"");
>>> 
>>> query = "${xx:replace( \"'\", '\"')}";
>>> verifyEquals(query, attributes, "say \"hi\"");
>>> 
>>> query = "${xx:replace( '\\'' <smb://''>, '\"')}";
>>> System.out.println(query);
>>> verifyEquals(query, attributes, "say \"hi\"");
>>> }
>>> TestQuery#testReplaceAllWithOddNumberOfBackslashPairs (below) also fails
>>> on the first verifyEquals call with a PatternSyntaxException. I am
>>> investigating that further.
>>> 
>>> @Test
>>> public void testReplaceAllWithOddNumberOfBackslashPairs() {
>>> final Map<String, String> attributes = new HashMap<>();
>>> attributes.put("filename", "C:\\temp\\.txt");
>>> 
>>> verifyEquals("${filename:replace('\\\\', '/')}", attributes,
>>> "C:/temp/.txt");
>>> verifyEquals("${filename:replaceAll('\\\\\\\\', '/')}", attributes,
>>> "C:/temp/.txt");
>>> verifyEquals("${filename:replaceAll('\\\\\\.txt$', '')}", attributes,
>>> "C:\\temp");
>>> }
>>> While I originally had just modified replace, after looking at the EL
>>> documentation [5], replace is explicitly documented to only replace literal
>>> expressions, and does so multiple times, as does Java’s String#replace [6].
>>> I now propose to add another method replaceFirst, which accepts a pattern
>>> and replaces only the first occurrence. I will update the unit tests to
>>> properly capture this, and will update the documentation to reflect the new
>>> method.
>>> 
>>> Thoughts from the community?
>>> 
>>> [1] https://issues.apache.org/jira/browse/NIFI-1919 <https://issues.apache.org/jira/browse/NIFI-1919>
>>> [2]
>>> 
>>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java
<https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/main/java/org/apache/nifi/attribute/expression/language/evaluation/functions/ReplaceEvaluator.java>
>>> [3]
>>> 
>>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy
<https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/groovy/org/apache/nifi/attribute/expression/language/QueryGroovyTest.groovy>
>>> [4]
>>> 
>>> https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java
<https://github.com/alopresto/nifi/blob/NIFI-1919/nifi-commons/nifi-expression-language/src/test/java/org/apache/nifi/attribute/expression/language/TestQuery.java>
>>> [5]
>>> 
>>> https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace
<https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replace>
>>> [6]
>>> 
>>> https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)
<https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)>
>>> 
>>> 
>>> 
>>> Andy LoPresto
>>> alopresto@apache.org <mailto:alopresto@apache.org>
>>> alopresto.apache@gmail.com <mailto:alopresto.apache@gmail.com>
>>> PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69
> 


Mime
View raw message