pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (Resolved) (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (PIG-2514) REGEX_EXTRACT not returning correct group with non greedy regex
Date Fri, 02 Mar 2012 00:08:59 GMT

     [ https://issues.apache.org/jira/browse/PIG-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai resolved PIG-2514.
-----------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed

Unit test pass. test-patch:
     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec] 
     [exec]     -1 javadoc.  The javadoc tool appears to have generated 1 warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler
warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     -1 release audit.  The applied patch generated 535 release audit warnings
(more than the trunk's current 530 warnings).

javadoc and release audit warning is unrelated. 

Patch committed to trunk.

Thanks Romain!
                
> REGEX_EXTRACT not returning correct group with non greedy regex
> ---------------------------------------------------------------
>
>                 Key: PIG-2514
>                 URL: https://issues.apache.org/jira/browse/PIG-2514
>             Project: Pig
>          Issue Type: Bug
>          Components: internal-udfs
>    Affects Versions: 0.11
>            Reporter: Romain Rigaux
>            Assignee: Romain Rigaux
>            Priority: Minor
>             Fix For: 0.11
>
>         Attachments: PIG-2514-doc.patch, PIG-2514.2.patch, PIG-2514.patch
>
>
> Hello,
> REGEX_EXTRACT is using Matcher.find() instead of Matcher.matches() and so does not work
with some non greedy regular expression.
> Is it the wanted behavior?
> Thanks,
> Romain
> http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Matcher.html
> The matches method attempts to match the entire input sequence against the pattern.
> The find method scans the input sequence looking for the next subsequence that matches
the pattern.
>     System.out.println("Pig's way with m.find()");
>     String a = "hdfs://mygrid.com/projects/";
>     Matcher m = Pattern.compile("(.+?)/?").matcher(a);
>     System.out.println(m.find());
>     System.out.println(m.group(1));
>     System.out.println(m.start());
>     System.out.println(m.end());
>     System.out.println("\nm.matches()");
>     a = "hdfs://mygrid.com/projects/";
>     m = Pattern.compile("(.+?)/?").matcher(a);
>     System.out.println(m.matches());
>     System.out.println(m.group(1));
>     System.out.println(m.start());
>     System.out.println(m.end());
>     System.out.println("\nREGEX_EXTRACT m.find()");
>     Tuple t = TupleFactory.getInstance().newTuple();
>     t.append(a);
>     t.append("(.+?)/?");
>     t.append(1);
>     System.out.println(new TestPigExtractAll().new REGEX_EXTRACT().exec(t));

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message