nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ijokarumawak <...@git.apache.org>
Subject [GitHub] nifi issue #1050: NIFI-2071 - Support repeating capture groups in ExtractTex...
Date Fri, 23 Sep 2016 08:35:19 GMT
Github user ijokarumawak commented on the issue:

    https://github.com/apache/nifi/pull/1050
  
    Thanks @pvillard31 this enhancement would be useful!
    I reviewed the change and played with unit tests, and found 3 comments I'd like to share.
    
    ### 1. Change of default behavior
    
    I'm worrying about the effect of existing data-flows. Since there's no guarantee that
nobody has taken advantage of the original behavior intentionally, I would prefer to add a
new Processor property to enable this feature, such as 'Enable Repeating Capture Groups: true/false',
in order to keep current configuation intact.
    
    ### 2. Processor documentation
    
    The commit doesn't update the processor description, but there's a sentence which goes
    
    > If the Regular Expression matches more than once, only the first match will be used.
    
    This should be updated at least.
    
    ### 3. Test case to clarify behavior
    
    I was wondering what if multiple capturing groups are specified, and that regex can be
repeated.  Are you interested in adding following test-case? Perhaps, additional documentation
on how the repeated capture groups are stored with indexed attribute names would be helpful,
too.
    
    ```Java
        @Test
        public void testFindAllPair() throws Exception {
            final TestRunner testRunner = TestRunners.newTestRunner(new ExtractText());
            final String attributeKey = "regex.result";
            testRunner.setProperty(attributeKey, "(\\w+)=(\\d+)");
            testRunner.enqueue("a=1,b=10,c=100".getBytes("UTF-8"));
            testRunner.run();
            testRunner.assertAllFlowFilesTransferred(ExtractText.REL_MATCH, 1);
            final MockFlowFile out = testRunner.getFlowFilesForRelationship(ExtractText.REL_MATCH).get(0);
            // Ensure the zero capture group is in the resultant attributes
            out.assertAttributeExists(attributeKey + ".0");
            out.assertAttributeExists(attributeKey + ".1");
            out.assertAttributeExists(attributeKey + ".2");
            out.assertAttributeExists(attributeKey + ".3");
            out.assertAttributeExists(attributeKey + ".4");
            out.assertAttributeExists(attributeKey + ".5");
            out.assertAttributeExists(attributeKey + ".6");
            out.assertAttributeNotExists(attributeKey + ".7"); // Ensure there's no more attributes
            out.assertAttributeEquals(attributeKey, "a");
            out.assertAttributeEquals(attributeKey + ".0", "a=1");
            out.assertAttributeEquals(attributeKey + ".1", "a");
            out.assertAttributeEquals(attributeKey + ".2", "1");
            out.assertAttributeEquals(attributeKey + ".3", "b");
            out.assertAttributeEquals(attributeKey + ".4", "10");
            out.assertAttributeEquals(attributeKey + ".5", "c");
            out.assertAttributeEquals(attributeKey + ".6", "100");
        }
    ```
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message