lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Pugh (Commented) (JIRA)" <>
Subject [jira] [Commented] (SOLR-2990) solr OOM issues
Date Fri, 30 Dec 2011 13:23:30 GMT


Eric Pugh commented on SOLR-2990:

I have found that Solr CELL is great for small numbers of documents, or quick prototyping.
 But as you scale up in either # or complexity of documents, it becomes a bottle neck.  The
Tika CLI is very easy to use, and you can throw more resources at doing Tika extraction if
you do it outside of Solr and then just send the text in, versus doing it inside of Solr.
 And it's less risk that you bring down Solr!   I wonder if we should put something in the
wiki that recommends that if you start having problems with Solr CELL, then move to running
Tika outside, and maybe include some sample code?

Solr Cell is an awesome feature, but it can also cut you!
> solr OOM issues
> ---------------
>                 Key: SOLR-2990
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 4.0
>         Environment: CentOS 5.x/6.x
> Solr Build apache-solr-4.0-2011-11-04_09-29-42 (includes tika 1.0)
> java -server -Xms2G -Xmx2G -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/log/oom/solr.dump.1 -DSTOP.PORT=8907
-DSTOP.KEY=STOP -jar start.jar
>            Reporter: Rob Tulloh
> We see intermittent issues with OutOfMemory caused by tika failing to process content.
Here is an example:
> Dec 29, 2011 7:12:05 AM org.apache.solr.common.SolrException log
> SEVERE: java.lang.OutOfMemoryError: Java heap space
>         at org.apache.poi.hmef.attribute.TNEFAttribute.<init>(
>         at org.apache.poi.hmef.attribute.TNEFAttribute.create(
>         at org.apache.poi.hmef.HMEFMessage.process(
>         at org.apache.poi.hmef.HMEFMessage.process(
>         at org.apache.poi.hmef.HMEFMessage.process(
>         at org.apache.poi.hmef.HMEFMessage.process(
>         at org.apache.poi.hmef.HMEFMessage.<init>(
>         at
>         at org.apache.tika.parser.CompositeParser.parse(
>         at org.apache.tika.parser.CompositeParser.parse(
>         at org.apache.tika.parser.AutoDetectParser.parse(
>         at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(
>         at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(
>         at org.apache.solr.core.SolrCore.execute(
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(
>         at org.mortbay.jetty.servlet.ServletHandler.handle(
>         at
>         at org.mortbay.jetty.servlet.SessionHandler.handle(
>         at org.mortbay.jetty.handler.ContextHandler.handle(
>         at org.mortbay.jetty.webapp.WebAppContext.handle(

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message