lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Tulloh (Commented) (JIRA)" <>
Subject [jira] [Commented] (SOLR-2990) solr OOM issues
Date Thu, 29 Dec 2011 14:27:30 GMT


Rob Tulloh commented on SOLR-2990:

In this particular test, we are using 2 threads to feed a single solr instance. We batch documents
according to these parameters:

1. Max bytes: 5M
2. Max docs: 200

These are thresholds. So, it is possible for a large document of size greater than 5M to get
fed to Solr by itself. However, consider this. What I observe is that it is content type rather
than size that is causing issues. I have seen 2 particular behaviors of concern. The first
is slow/sluggish behavior. I have some outputs from our load generator that show that Solr/Tika
sometimes takes over 10 minutes to injest some content. I have one test set where I feed 4
documents in a single batch and it takes over 13 minutes for these 4 documents to get indexed.
This was run against an empty solr index. The other behavior is OOM.

I cannot share the content as the content is proprietary. I am happy to provide more details
from Solr and/or Tika if you can tell me what to look for or what debug I should enable to
capture helpful information.
> solr OOM issues
> ---------------
>                 Key: SOLR-2990
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 4.0
>         Environment: CentOS 5.x/6.x
> Solr Build apache-solr-4.0-2011-11-04_09-29-42 (includes tika 1.0)
> java -server -Xms2G -Xmx2G -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/oom/solr.dump.1 -DSTOP.PORT=8907
-DSTOP.KEY=STOP -jar start.jar
>            Reporter: Rob Tulloh
> We see intermittent issues with OutOfMemory caused by tika failing to process content.
Here is an example:
> Dec 29, 2011 7:12:05 AM org.apache.solr.common.SolrException log
> SEVERE: java.lang.OutOfMemoryError: Java heap space
>         at org.apache.poi.hmef.attribute.TNEFAttribute.<init>(
>         at org.apache.poi.hmef.attribute.TNEFAttribute.create(
>         at org.apache.poi.hmef.HMEFMessage.process(
>         at org.apache.poi.hmef.HMEFMessage.process(
>         at org.apache.poi.hmef.HMEFMessage.process(
>         at org.apache.poi.hmef.HMEFMessage.process(
>         at org.apache.poi.hmef.HMEFMessage.<init>(
>         at
>         at org.apache.tika.parser.CompositeParser.parse(
>         at org.apache.tika.parser.CompositeParser.parse(
>         at org.apache.tika.parser.AutoDetectParser.parse(
>         at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(
>         at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(
>         at org.apache.solr.core.SolrCore.execute(
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
>         at org.mortbay.jetty.servlet.ServletHandler.handle(
>         at
>         at org.mortbay.jetty.servlet.SessionHandler.handle(
>         at org.mortbay.jetty.handler.ContextHandler.handle(
>         at org.mortbay.jetty.webapp.WebAppContext.handle(

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message