lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <>
Subject [jira] [Commented] (SOLR-3881) frequent OOM in LanguageIdentifierUpdateProcessor
Date Tue, 16 Oct 2012 11:49:03 GMT


Jan Høydahl commented on SOLR-3881:

I'm sure it's possible to optimize memory footprint somehow. The reason why we concat all
"fl" fields before detection was originally because Tika's detector gets better and better
the longer input text you have. So while detection for individual short fields have a high
risk of mis-detection, the resulting concatenated string has a better chance.

A configurable max-cap in the concatenation may make sense, as the detection accuracy flattens
out after some threshold.

Perhaps we could also avoid the expandCapacity() and Ararys.copyOf() calls if we pre-allocate
the StringBuffer with the theoretical max size, being the size of our SolrInputDoc. If StringBuffer
is at 10kb and needs an extra 10b for an append, it will allocate a new buffer of (10kb+1)*2
capacity which is a waste. We should also switch to StringBuilder which is more performant.
> frequent OOM in LanguageIdentifierUpdateProcessor
> -------------------------------------------------
>                 Key: SOLR-3881
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.0
>         Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G -XX:+HeapDumpOnOutOfMemoryError
>            Reporter: Rob Tulloh
> We are seeing frequent failures from Solr causing it to OOM. Here is the stack trace
we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(
>         at java.lang.AbstractStringBuilder.expandCapacity(
>         at java.lang.AbstractStringBuilder.append(
>         at java.lang.StringBuffer.append(
>         at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(
>         at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(
>         at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(
>         at org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(
>         at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(
>         at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(
>         at org.apache.solr.common.util.JavaBinCodec.readVal(
>         at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(
>         at org.apache.solr.common.util.JavaBinCodec.readVal(
>         at org.apache.solr.common.util.JavaBinCodec.unmarshal(
>         at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(
>         at org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(
>         at org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(
>         at org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(
>         at org.apache.solr.core.SolrCore.execute(
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(
>         at org.eclipse.jetty.servlet.ServletHandler.doHandle(
>         at org.eclipse.jetty.server.handler.ScopedHandler.handle(
>         at
>         at org.eclipse.jetty.server.session.SessionHandler.doHandle(
>         at org.eclipse.jetty.server.handler.ContextHandler.doHandle(
>         at org.eclipse.jetty.servlet.ServletHandler.doScope(
>         at org.eclipse.jetty.server.session.SessionHandler.doScope(
>         at org.eclipse.jetty.server.handler.ContextHandler.doScope(
> {noformat}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message