lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven A Rowe <sar...@syr.edu>
Subject RE: Regex causes stack overflow!
Date Mon, 18 Jun 2012 05:48:46 GMT
Using Dawid's suggestions, I was able to reproduce the stack overflow with my little program
using thread stack cmdline option values lower than -Xss450k (on Win7-64, using both Java
6 and 7 64-bit Oracle JVMs).

I commented out each alternate clause and then performed the same binary search, and found
that commenting out only the first clause allowed the program to succeed with the minimum
allowed stack size: -Xss104k.

I was able to uncomment all clauses and also use the minimum allowed stack size by applying
possessive quantifiers -- '+' after a repetition operator, e.g. '*+' -- to multiline matching
portions of the clauses.  E.g. the worst offending first clause

	(?:.*\\[javac\\].*\\r?\\n)*.*\\[javac\\]\\s+[1-9]\\d*\\s+error.*\\r?\\n

became:

	(?:.*\\[javac\\]\\s++(?![1-9]\\d*\\s+error).*\\r?\\n)*+
	   .*\\[javac\\]\\s+[1-9]\\d*\\s+error.*\\r?\\n

This prevents backtracking and eliminates the associated stack frames.

I've un-reverted the BUILD_LOG_MULTILINE_REGEX content token in, and applied the above-described
changes to, the email notification configuration on Uwe's Jenkins instance; if that goes okay,
I'll un-revert and include these changes on the email notification configuration for our jobs
on the Apache Jenkins instance.

Steve

-----Original Message-----
From: Steven A Rowe [mailto:sarowe@syr.edu] 
Sent: Sunday, June 17, 2012 3:16 PM
To: dev@lucene.apache.org
Subject: RE: Regex causes stack overflow!

Good ideas, thanks Dawid. - Steve

-----Original Message-----
From: dawid.weiss@gmail.com [mailto:dawid.weiss@gmail.com] On Behalf Of Dawid Weiss
Sent: Sunday, June 17, 2012 2:45 PM
To: dev@lucene.apache.org
Subject: Re: Regex causes stack overflow!

Not Uwe, but I'd try binary-searching for minimum stack it overflows on. I don't remember
if the default (max) stack size depends on the machine's environment  but if it does it may
be that.

As for the cause/fix for this, try eliminating alternate clauses one by one and figure out
which clause causes this deep recursion. I couldn't tell from the look of it.

We could really make use of those non-backtracking regexps :)

Dawid

On Sun, Jun 17, 2012 at 8:36 PM, Steven A Rowe <sarowe@syr.edu> wrote:
> On 6/17/2012 at 11:42 AM, Uwe Schindler wrote:
>> We had a failed build last night, but no eMail was sent! So I looked 
>> into the server's log, and found the following - something in your 
>> regex seems to cause a stack overflow (this is not shown in the build 
>> log itself, only the server log). The same may happen on Apache's 
>> Jenkins, but I have no access to build logs there.
>
> Hmm, it appears to be affecting Apache's Jenkins too - no email was sent for these failed
builds: <https://builds.apache.org/job/Solr-4.x/12/> and <https://builds.apache.org/job/Solr-trunk/1887/>.
>
>> Jun 17, 2012 9:35:29 AM hudson.plugins.girls.CordellWalkerRecorder
>> <init>
>> INFO: Girls are activated
>> Jun 17, 2012 9:34:29 AM hudson.model.Executor run
>> SEVERE: Executor threw an exception
>> java.lang.StackOverflowError
>>       at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
>>       at
>> java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
>>       at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
>>       at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
>>       at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
>>       at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
>>       at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
>>       at
>> java.util.regex.Pattern$CharProperty.match(Pattern.java:3345)
>>       at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
>>       at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
>>       at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
> [...]
>
> I downloaded the logs for the two failed jobs
> (Lucene-Solr-trunk-Linux-Java7-64/307 & .../309), and ran the 
> below-listed program against them on Win7 using both Oracle JDK
> 1.6.0_21 and 1.7.0_01 with default settings.  No stack overflow, and 
> it finds and prints out the expected stuff.  (I include the 
> line-counting thing because that's also used by the Jenkins plugin, 
> just in case that might be the problem.)
>
> FYI, the source for the BUILD_LOG_MULTILINE_REGEX functionality is here: <https://github.com/jenkinsci/email-ext-plugin/blob/master/src/main/java/hudson/plugins/emailext/plugins/content/BuildLogMultilineRegexContent.java>;
test suite here: <https://github.com/jenkinsci/email-ext-plugin/blob/master/src/test/java/hudson/plugins/emailext/plugins/content/BuildLogMultilineRegexContentTest.java>.
>
> Uwe, do you have any idea how to diagnose what's happening?
>
> Steve
>
> -----------------------
> import java.io.File;
> import java.io.BufferedReader;
> import java.io.FileReader;
> import java.io.IOException;
> import java.util.regex.Pattern;
> import java.util.regex.Matcher;
>
> public class Test {
>    static final Pattern pattern = Pattern.compile("(?x:"
>      +"# Compilation failures\n"
>      +"(?:.*\\[javac\\].*\\r?\\n)*.*\\[javac\\]\\s+[1-9]\\d*\\s+error.*\\r?\\n  
                                         \n"
>      +"# Test failures                                        
                                                           \n"
>      +"|.*\\[junit4\\]\\s*Suite:.*[\\r\\n]+.*\\[junit4\\]\\s*(?!Completed)(?!IGNOR)\\S(?s:.*?)<<<\\s*FAILURES!
           \n"
>      +"# Source file license problems                              
                                                      \n"
>      +"|.*rat-sources:.*(?:\\r?\\n.*\\[echo\\].*)*\\s+[1-9]\\d*\\s+Unknown\\s+Licenses.*\\r?\\n(?:.*\\[echo\\].*\\r?\\n)*
\n"
>      +"# Third-party dependency license problems - include 2 preceding lines and 1
following line                         \n"
>      +"|(?:.*\\r?\\n){2}.*\\[licenses\\]\\s+MISSING\\s+sha1(?:.*\\r?\\n){2}    
                                          \n"
>      +"# Javadoc warnings                                      
                                                          \n"
>      +"|(?:.*\\[javadoc\\].*\\r?\\n)*.*\\[javadoc\\]\\s*[1-9]\\d*\\s+warnings.*\\r?\\n
                                   \n"
>      +"# Other javadocs problems: broken links and missing javadocs          
                                            \n"
>      +"|.*javadocs-lint:.*\\r?\\n(?:.*\\[echo\\].*\\r?\\n)*                
                                              \n"
>      +"# Thread dumps - include 1 preceding line and the remainder of the log    
                                        \n"
>      +"|.*\\r?\\n.*Full\\s+thread\\s+dump(?s:.*)                      
                                                   \n"
>      +"# Jenkins problems - include the remainder of the log              
                                               \n"
>      +"|.*(?:FATAL|ERROR):(?s:.*)                                
                                                        \n"
>      +"# Include the Ant call stack - include the remainder of the log        
                                           \n"
>      +"|.*BUILD\\s+FAILED(?s:.*)                                  
                                                       \n"
>      +")");
>    static final Pattern lineCountPattern = 
> Pattern.compile("(?<=.)\r?\n");
>
>    public static void main(String[] args) throws IOException {
>        StringBuilder builder = new StringBuilder();
>        File file = new File(args[0]);
>        BufferedReader reader = new BufferedReader(new 
> FileReader(file));
>        String line;
>        while (null != (line = reader.readLine())) {
>            builder.append(line).append("\n");
>        }
>        Matcher matcher = pattern.matcher(builder);
>        while (matcher.find()) {
>            System.err.println("Found: '" + matcher.group() + "'");
>        }
>        matcher = lineCountPattern.matcher(builder);
>        int lineCount = 0;
>        while (matcher.find()) {
>            ++lineCount;
>        }
>        System.err.println("# lines: " + lineCount);
>    }
> }
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For 
> additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail:
dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail:
dev-help@lucene.apache.org

Mime
View raw message