lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3881) frequent OOM in LanguageIdentifierUpdateProcessor
Date Mon, 15 Oct 2012 23:51:06 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476584#comment-13476584
] 

Hoss Man commented on SOLR-3881:
--------------------------------

bq. One possible solution is to limit the size of the string that is selected for concatenation.

I don't know if there is anyway to make LanguageIdentifierUpdateProcessor more memory efficient
(in particular, i'm not sure why it needs to concat the field values instead of operating
on them directly) but if you want to give langId just the first N characters of another field:
that should already be possible w/o cod changes by wiring together the  CloneFieldUpdateProcessorFactory
with the TruncateFieldUpdateProcessorFactory.

Something like this should work...

{code}
 ...
 <processor class="solr.CloneFieldUpdateProcessorFactory">
   <str name="source">GIANT_HONKING_STRING_FIELD</str>
   <str name="dest">truncated_string_field_for_lang_detect</str>
 </processor>
 <processor class="solr.TruncateFieldUpdateProcessorFactory">
   <str name="fieldName">truncated_string_field_for_lang_detect</str>
   <int name="maxLength">65536</int>
 </processor>
 <processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory">
   <!-- <str name="langid.fl">title,subject,GIANT_HONKING_STRING_FIELD</str>
-->
   <str name="langid.fl">title,subject,truncated_string_field_for_lang_detect</str>
   ...
 </processor>
 <processor class="solr.IgnoreFieldUpdateProcessorFactory">
   <str name="fieldName">truncated_string_field_for_lang_detect</str>
 </processor>
 ...
{code}

Neither CloneFieldUpdateProcessorFactory nor TruncateFieldUpdateProcessorFactory will make
a full copy of the original String value, and TruncateFieldUpdateProcessorFactory will only
make a truncated copy if the sources is longer then the configured max (and even then wether
any copy is actaully made really just depends on how the JVM implements substring). IgnoreFieldUpdateProcessorFactory
will ensure that the truncated copy is freed up for GC as soon as you are done with LangId.
                
> frequent OOM in LanguageIdentifierUpdateProcessor
> -------------------------------------------------
>
>                 Key: SOLR-3881
>                 URL: https://issues.apache.org/jira/browse/SOLR-3881
>             Project: Solr
>          Issue Type: Bug
>          Components: update
>    Affects Versions: 4.0
>         Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=....)
>            Reporter: Rob Tulloh
>
> We are seeing frequent failures from Solr causing it to OOM. Here is the stack trace
we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
>         at java.util.Arrays.copyOf(Arrays.java:2882)
>         at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
>         at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
>         at java.lang.StringBuffer.append(StringBuffer.java:224)
>         at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
>         at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
>         at org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
>         at org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
>         at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
>         at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
>         at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
>         at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
>         at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
>         at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
>         at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
>         at org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
>         at org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
>         at org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
>         at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
>         at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
>         at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
>         at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
>         at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
>         at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
>         at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
>         at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
>         at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message