lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Several builds hanging pecause of permgen
Date Tue, 09 Jul 2013 13:12:40 GMT
I know how to reproduce easier:

It’s not the number of multiplier, it’s the number of parallel JVMs: If you run with many
CPUs, the test runner runs tests in multiple JVMs, so every JVMs runs less tests. The Jenkins
server on MacOSX runs with one JVM only (because the virtual box has only 2 virtual CPUs).
So all tests have to share the permgen.

 

Something seems to eat all permgen, not by loading classes, but by interning strings. And
that’s the issue here.

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Uwe Schindler [mailto:uwe@thetaphi.de] 
Sent: Tuesday, July 09, 2013 11:48 AM
To: dev@lucene.apache.org
Subject: RE: Several builds hanging pecause of permgen

 

You need to add tests.multiplier=3 to hit the bug faster, otherwise it depends on test seed.

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 <http://www.thetaphi.de/> http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Alan Woodward [mailto:alan@flax.co.uk] 
Sent: Tuesday, July 09, 2013 11:17 AM
To: dev@lucene.apache.org
Subject: Re: Several builds hanging pecause of permgen

 

SOLR-4948 changed the test's default CoreContainer from using a hardcoded JulWatcher to whatever
would be picked up in the standard config, which will be log4j if that's what's configured
for tests.  Could that be the cause?  When did the permgen crashes start?

 

FWIW, I'm not seeing any permgen errors locally (MacOS, Java 7).

 

Alan Woodward
www.flax.co.uk

 

On 9 Jul 2013, at 08:57, Uwe Schindler wrote:

 

I digged more:

In most heap dumps, the failing thread looks like that, so it looks like Log4J is the bad
guy - I looked at the code, it does tricky stuff to get stack traces but I found no String.intern()
anywhere.

My question: When did we change tests to use log4j? Was this before/after the hadoop change?
Maybe it's not related to hadoop, and is more a bug in Log4J. I find similar stack traces
all around the internet always linked to Log4J.

"qtp1522342989-5342" prio=5 tid=5342 RUNNABLE
          at java.lang.OutOfMemoryError.<init>(OutOfMemoryError.java:25)
          at java.lang.Throwable.getStackTraceElement(Native Method)
          at java.lang.Throwable.getOurStackTrace(Throwable.java:591)
          at java.lang.Throwable.getStackTrace(Throwable.java:582)
          at sun.reflect.GeneratedMethodAccessor2.invoke(<unknown string>)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
          at java.lang.reflect.Method.invoke(Method.java:597)
          at org.apache.log4j.spi.LocationInfo.<init>(LocationInfo.java:139)
            Local Variable: org.apache.log4j.spi.LocationInfo#52
            Local Variable: java.lang.String#28067
            Local Variable: java.lang.Throwable#1
          at org.apache.log4j.spi.LoggingEvent.getLocationInformation(LoggingEvent.java:253)
          at org.apache.solr.util.SolrLogLayout._format(SolrLogLayout.java:122)
            Local Variable: java.lang.StringBuilder#32
            Local Variable: java.lang.String#26042
            Local Variable: org.apache.solr.util.SolrLogLayout#1
          at org.apache.solr.util.SolrLogLayout.format(SolrLogLayout.java:110)
          at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:310)
            Local Variable: org.apache.log4j.ConsoleAppender#1
          at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
          at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
          at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
            Local Variable: org.apache.log4j.helpers.AppenderAttachableImpl#1
          at org.apache.log4j.Category.callAppenders(Category.java:206)
            Local Variable: org.apache.log4j.spi.LoggingEvent#52
            Local Variable: org.apache.log4j.Logger#52
          at org.apache.log4j.Category.forcedLog(Category.java:391)
          at org.apache.log4j.Category.log(Category.java:856)
          at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
          at org.apache.solr.core.SolrCore.execute(SolrCore.java:1909)
          at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
          at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
            Local Variable: org.apache.solr.handler.StandardRequestHandler#4
            Local Variable: org.apache.solr.core.SolrCore#3
            Local Variable: java.lang.String#25968
            Local Variable: org.apache.solr.response.SolrQueryResponse#3
            Local Variable: org.eclipse.jetty.servlet.ServletHandler$CachedChain#15
            Local Variable: org.apache.solr.servlet.cache.Method#3
            Local Variable: org.apache.solr.servlet.SolrRequestParsers$1#3
            Local Variable: org.apache.solr.servlet.SolrDispatchFilter#2
            Local Variable: org.apache.solr.core.CoreContainer#3
            Local Variable: org.apache.solr.servlet.SolrRequestParsers#3
            Local Variable: org.apache.solr.core.SolrConfig#3
          at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
          at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
          at org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:136)
          at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
          at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
            Local Variable: org.eclipse.jetty.servlet.ServletHandler$CachedChain#13
          at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:229)
          at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
          at org.eclipse.jetty.server.handler.GzipHandler.handle(GzipHandler.java:301)
            Local Variable: org.eclipse.jetty.server.handler.GzipHandler#2
          at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1077)
          at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
            Local Variable: org.eclipse.jetty.servlet.ServletHolder#2
            Local Variable: org.eclipse.jetty.servlet.ServletHandler#2
          at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
            Local Variable: org.eclipse.jetty.server.session.SessionHandler#2
          at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
            Local Variable: java.lang.String#26053
            Local Variable: org.eclipse.jetty.servlet.ServletContextHandler#2
            Local Variable: org.eclipse.jetty.server.Request#4
            Local Variable: javax.servlet.DispatcherType#1
            Local Variable: org.eclipse.jetty.server.Response#4
          at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
          at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
          at org.eclipse.jetty.server.Server.handle(Server.java:368)
          at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
            Local Variable: org.eclipse.jetty.server.Server#2
            Local Variable: java.lang.String#25761
          at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
          at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
            Local Variable: org.eclipse.jetty.io.View#5
            Local Variable: org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler#4
          at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
          at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
            Local Variable: org.eclipse.jetty.http.HttpParser#4
          at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
            Local Variable: org.eclipse.jetty.server.AsyncHttpConnection#4
          at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:628)
            Local Variable: org.eclipse.jetty.io.nio.SelectChannelEndPoint#4
          at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
            Local Variable: org.eclipse.jetty.io.nio.SelectChannelEndPoint$1#4
          at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
          at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
          at java.lang.Thread.run(Thread.java:662)

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de




-----Original Message-----

From: Uwe Schindler [mailto:uwe@thetaphi.de]

Sent: Tuesday, July 09, 2013 9:32 AM

To: dev@lucene.apache.org

Subject: RE: Several builds hanging pecause of permgen

 

I inspected the heap dumps already, 2 things are interesting:

 

- The heap dump file is quite small (only approx. 50 MB), so the permgen

issues start when not even -Xmx is used completely.

- There are lots of (250) reflection class loaders, but no leaks. All classloaders

exist one time, just the reflection-based sun.relect.DelegatingClassLoader

with the stubs for speeding up reflection shows up, but only with one

class/loader. Which is perfectly fine.

- ~2500 classes loaded, which is not much.

 

How do I get the number of interned Strings from jvisualvm?

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

 

-----Original Message-----

From: dawid.weiss@gmail.com [mailto:dawid.weiss@gmail.com] On Behalf

Of Dawid Weiss

Sent: Tuesday, July 09, 2013 9:13 AM

To: dev@lucene.apache.org

Subject: Re: Several builds hanging pecause of permgen

 

What's the core of those permgen leaks, guys? Is it the number of

classes (via different classloaders)? Number of interned strings?

Maybe something is going on that we should reflect on before we raise

the default permgen limit? Our typical permgen headaches were due to

repeated classloader leaks -- is Hadoop using something like that?

 

Dawid

 

On Tue, Jul 9, 2013 at 8:25 AM, Uwe Schindler <uwe@thetaphi.de> wrote:

Hi Mark,

 

The problem with raising permgen is:

- It's Hotspot specific only, so does not work with other JVMs

- Its no longer available in Java 8

 

I would really prefer to maybe tune the tests and maybe not create

so

many nodes in the cloud tests. It looks like the bug happens more

often with higher test multiplier (-Dtests.multiplier=3), so maybe we

can really tune that.

If we want to raise permgen, we have to do it in a similar way like

we do enable the heap dumps - with lots of <condition/> tasks in ANT...

:(

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

 

-----Original Message-----

From: Mark Miller [mailto:markrmiller@gmail.com]

Sent: Tuesday, July 09, 2013 5:20 AM

To: dev@lucene.apache.org

Subject: Re: Several builds hanging pecause of permgen

 

Looks like we currently don't set the max perm gen for tests, so you

get the default - I think we want to change that regardless - we

don't want it to vary IMO - it should work like Xmx.

 

I think we should just set it to 128 mb, and these tests should have

plenty of room to run.

 

- Mark

 

On Jul 8, 2013, at 11:06 PM, Mark Miller <markrmiller@gmail.com>

wrote:

 

 

On Jul 8, 2013, at 2:26 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

 

Next one:

http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6395/console

 

I will vote all releases of 4.4 with -1 until this is fixed!

 

You can't veto a release, so it's kind of a hollow threat ;)

 

It hangs on my local computer, too! Tests pass only ½ of the time,

the

remaining time it hangs with permgen errors.

 

Depending on the host OS and java version, I have had to raise the

max

perm gem a bit higher for heavy SolrCloud tests that also start up

hdfs. I think I remember raising it from 96 MB to 128 MB. It's simply

a test resource issue - those tests are very heavy, which is why most

are set to run nightly - I've seen the code run heavily without any

perm gen issues though - and on some of my machines, I don't have to

raise the perm gen at all.

 

- Mark

 

 

Uwe

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: dawid.weiss@gmail.com [mailto:dawid.weiss@gmail.com] On

Behalf

Of Dawid Weiss

Sent: Monday, July 08, 2013 2:16 PM

To: dev@lucene.apache.org

Subject: Re: Several builds hanging pecause of permgen

 

 

Not much I can do from my side about permgen errors. There is

really no

way to deal with these from within Java (the same process) -- you

cannot effectively handle anything because your own classes may not

load at all.

 

Dawid

 

On Mon, Jul 8, 2013 at 1:35 PM, Uwe Schindler <uwe@thetaphi.de>

wrote:

Another one, this time on OSX:

http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/617/

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

From: Robert Muir [mailto:rcmuir@gmail.com]

Sent: Sunday, July 07, 2013 11:15 PM

To: dev@lucene.apache.org

Subject: Re: Several builds hanging pecause of permgen

 

When there were leaks from static classes, we added a checker to

LuceneTestCase that looks for RAM > N and fails with debugging

information.

 

I wonder if some similar check is possible for this case (to make

it easier

than going thru heapdumps, and to find issues before crash-time)...

 

On Sun, Jul 7, 2013 at 4:10 PM, Uwe Schindler <uwe@thetaphi.de>

wrote:

Another one:

http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-

Linux/6385/testRepo

rt/

 

 

junit/junit.framework/TestSuite/org_apache_solr_request_SimpleFacetsT

est/

 

Had to be killed with kill -9

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

 

-----Original Message-----

From: Uwe Schindler [mailto:uwe@thetaphi.de]

Sent: Saturday, July 06, 2013 10:16 PM

To: dev@lucene.apache.org

Subject: RE: Several builds hanging pecause of permgen

 

Another one:

http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6375/console

 

I was only able to kill the JVM with kill -9 I am sure, it's horrible

slowdoop!

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

 

-----Original Message-----

From: Uwe Schindler [mailto:uwe@thetaphi.de]

Sent: Friday, July 05, 2013 3:59 PM

To: dev@lucene.apache.org

Subject: Several builds hanging pecause of permgen

 

Several Jenkins builds now hang because of permgen. The runner

JVM

is dead (can only be killed by -9), last example:

 

http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-

Linux/6360/console

 

-----

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de

eMail: uwe@thetaphi.de

 

 

 

 

-------------------------------------------------------------------

-- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org

For

additional commands, e-mail: dev-help@lucene.apache.org

 

 

--------------------------------------------------------------------

- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For

additional commands, e-mail: dev-help@lucene.apache.org

 

 

---------------------------------------------------------------------

To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For

additional commands, e-mail: dev-help@lucene.apache.org

 

 

 

 

 

---------------------------------------------------------------------

To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For

additional

commands, e-mail: dev-help@lucene.apache.org

 

 

---------------------------------------------------------------------

To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org

For additional commands, e-mail: dev-help@lucene.apache.org

 

 

---------------------------------------------------------------------

To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org

For additional commands, e-mail: dev-help@lucene.apache.org

 

 

---------------------------------------------------------------------

To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org

For additional commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

 


Mime
View raw message