lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (LUCENE-7538) Uploading large text file to a field causes "this IndexWriter is closed" error
Date Thu, 10 Nov 2016 10:54:58 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-7538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless resolved LUCENE-7538.
----------------------------------------
       Resolution: Fixed
    Fix Version/s: 6.4
                   master (7.0)

Thanks [~steve_chen_veeva]!

> Uploading large text file to a field causes "this IndexWriter is closed" error
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-7538
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7538
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 5.5.1
>            Reporter: Steve Chen
>             Fix For: master (7.0), 6.4
>
>         Attachments: LUCENE-7538.patch, LUCENE-7538.patch
>
>
> We have seen "this IndexWriter is closed" error after we tried to upload a large text
file to a single Solr text field. The field definition in the schema.xml is:
> {noformat}
> <field name="fileContent" type="text_general" indexed="true" stored="true" termVectors="true"
termPositions="true" termOffsets="true"/>
> {noformat}
> After that, the IndexWriter remained closed and couldn't be recovered until we reloaded
the Solr core.  The text file had size of 800MB, containing only numbers and English characters.
> Stack trace is shown below:
> {noformat}
> 2016-11-02 23:00:17,913 [http-nio-19082-exec-3] ERROR org.apache.solr.handler.RequestHandlerBase
- org.apache.solr.common.SolrException: Exception writing document id 1487_0_1 to the index;
possible analysis error.
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:180)
>         at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68)
>         at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:934)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1089)
>         at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:712)
>         at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
>         at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:126)
>         at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:131)
>         at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:237)
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:2082)
>         at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:651)
>         at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:184)
>         at veeva.ecm.common.interfaces.web.SolrDispatchOverride.doFilter(SolrDispatchOverride.java:43)
>         at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
>         at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>         at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:212)
>         at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
>         at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:141)
>         at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:79)
>         at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:616)
>         at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
>         at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:521)
>         at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1096)
>         at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:674)
>         at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1500)
>         at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1456)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
>         at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:720)
>         at org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:734)
>         at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1473)
>         at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:282)
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214)
>         at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)
>         ... 34 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 56
>         at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:201)
>         at org.apache.lucene.util.UnicodeUtil.UTF16toUTF8(UnicodeUtil.java:183)
>         at org.apache.lucene.codecs.compressing.GrowableByteArrayDataOutput.writeString(GrowableByteArrayDataOutput.java:72)
>         at org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.writeField(CompressingStoredFieldsWriter.java:292)
>         at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:382)
>         at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:321)
>         at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:234)
>         at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
>         at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1477)
>         ... 37 more
> {noformat}
> We debugged and traced down the issue.  It was an integer overflow problem that was not
properly handled.  The GrowableByteArrayDataOutput::writeString(String string) method is shown
below:
> {noformat}
> @Override
>   public void writeString(String string) throws IOException {
>     int maxLen = string.length() * UnicodeUtil.MAX_UTF8_BYTES_PER_CHAR;
>     if (maxLen <= MIN_UTF8_SIZE_TO_ENABLE_DOUBLE_PASS_ENCODING)  {
>       // string is small enough that we don't need to save memory by falling back to
double-pass approach
>       // this is just an optimized writeString() that re-uses scratchBytes.
>       scratchBytes = ArrayUtil.grow(scratchBytes, maxLen);
>       int len = UnicodeUtil.UTF16toUTF8(string, 0, string.length(), scratchBytes);
>       writeVInt(len);
>       writeBytes(scratchBytes, len);
>     } else  {
>       // use a double pass approach to avoid allocating a large intermediate buffer for
string encoding
>       int numBytes = UnicodeUtil.calcUTF16toUTF8Length(string, 0, string.length());
>       writeVInt(numBytes);
>       bytes = ArrayUtil.grow(bytes, length + numBytes);
>       length = UnicodeUtil.UTF16toUTF8(string, 0, string.length(), bytes, length);
>     }
>   }
> {noformat}
> The 800MB text file stored in the string parameter of the method had a length of 800
million, the maxLen became negative integer as the result of the length times 3. The negative
integer was then passed into ArrayUtil.grow(scratchBytes, maxLen):
> {noformat}
> public static byte[] grow(byte[] array, int minSize) {
>     assert minSize >= 0: "size must be positive (got " + minSize + "): likely integer
overflow?";
>     if (array.length < minSize) {
>       byte[] newArray = new byte[oversize(minSize, 1)];
>       System.arraycopy(array, 0, newArray, 0, array.length);
>       return newArray;
>     } else
>       return array;
>   }
> {noformat}
> Assertion was disabled in production so the execution won't stop. The original array
was returned from the method call without increasing the size, which caused an ArrayIndexOutOfBoundsException
to be thrown.  The ArrayIndexOutOfBoundsException was wrapped into AbortingException, and
later on caused the IndexWriter to be closed in IndexWriter class.
> The code should fail faster with a more-specific error for the integer overflow problem,
and shouldn't cause the IndexWriter to be closed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message