lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Indexing very large files.
Date Wed, 16 Jan 2008 16:04:29 GMT
I don't think this is a StringBuilder limitation, but rather your Java
JVM doesn't start with enough memory. i.e. -Xmx.

In raw Lucene, I've indexed 240M files........

Best
Erick

On Jan 16, 2008 10:12 AM, David Thibault <dave@itstrategypartners.com>
wrote:

> All,
> I just found a thread about this on the mailing list archives because I'm
> troubleshooting the same problem.  The kicker is that it doesn't take such
> large files to kill the StringBuilder.  I have discovered the following:
>
> By using a text file made up of  3,443,464 bytes or less, I get no error.
>
> AT 3,443,465 bytes:
>
>
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>
>        at java.lang.String.<init>(String.java:208)
>
>        at java.lang.StringBuilder.toString(StringBuilder.java:431)
>
>        at org.junit.Assert.format(Assert.java:321)
>
>        at org.junit.ComparisonFailure$ComparisonCompactor.compact(
> ComparisonFailure.java:80)
>
>        at org.junit.ComparisonFailure.getMessage(ComparisonFailure.java
> :37)
>
>        at java.lang.Throwable.getLocalizedMessage(Throwable.java:267)
>
>        at java.lang.Throwable.toString(Throwable.java:344)
>
>        at java.lang.String.valueOf(String.java:2615)
>
>        at java.io.PrintWriter.print(PrintWriter.java:546)
>
>        at java.io.PrintWriter.println(PrintWriter.java:683)
>
>        at java.lang.Throwable.printStackTrace(Throwable.java:510)
>
>        at org.apache.tools.ant.util.StringUtils.getStackTrace(
> StringUtils.java:96)
>
>        at
>
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.getFilteredTrace
> (JUnitTestRunner.java:856)
>
>        at
>
> org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.formatError
> (XMLJUnitResultFormatter.java:280)
>
>        at
>
> org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.addError
> (XMLJUnitResultFormatter.java:255)
>
>        at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$4.addError(
> JUnitTestRunner.java:988)
>
>        at junit.framework.TestResult.addError(TestResult.java:38)
>
>        at junit.framework.JUnit4TestAdapterCache$1.testFailure(
> JUnit4TestAdapterCache.java:51)
>
>        at org.junit.runner.notification.RunNotifier$4.notifyListener(
> RunNotifier.java:96)
>
>        at org.junit.runner.notification.RunNotifier$SafeNotifier.run(
> RunNotifier.java:37)
>
>        at org.junit.runner.notification.RunNotifier.fireTestFailure(
> RunNotifier.java:93)
>
>        at org.junit.internal.runners.TestMethodRunner.addFailure(
> TestMethodRunner.java:104)
>
>        at org.junit.internal.runners.TestMethodRunner.runUnprotected(
> TestMethodRunner.java:87)
>
>        at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
> BeforeAndAfterRunner.java:34)
>
>        at org.junit.internal.runners.TestMethodRunner.runMethod(
> TestMethodRunner.java:75)
>
>        at org.junit.internal.runners.TestMethodRunner.run(
> TestMethodRunner.java:45)
>
>        at
> org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod(
> TestClassMethodsRunner.java:71)
>
>        at org.junit.internal.runners.TestClassMethodsRunner.run(
> TestClassMethodsRunner.java:35)
>
>        at org.junit.internal.runners.TestClassRunner$1.runUnprotected(
> TestClassRunner.java:42)
>
>        at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
> BeforeAndAfterRunner.java:34)
>
>        at org.junit.internal.runners.TestClassRunner.run(
> TestClassRunner.java:52)
>
>        at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:32)
>
>
>
> AT 3,443,466 byes (or more) :
>
>
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>
>        at java.lang.AbstractStringBuilder.expandCapacity(
> AbstractStringBuilder.java:99)
>
>        at java.lang.AbstractStringBuilder.append(
> AbstractStringBuilder.java
> :393)
>
>        at java.lang.StringBuilder.append(StringBuilder.java:120)
>
>        at org.junit.Assert.format(Assert.java:321)
>
>        at org.junit.ComparisonFailure$ComparisonCompactor.compact(
> ComparisonFailure.java:80)
>
>        at org.junit.ComparisonFailure.getMessage(ComparisonFailure.java
> :37)
>
>        at java.lang.Throwable.getLocalizedMessage(Throwable.java:267)
>
>        at java.lang.Throwable.toString(Throwable.java:344)
>
>        at java.lang.String.valueOf(String.java:2615)
>
>        at java.io.PrintWriter.print(PrintWriter.java:546)
>
>        at java.io.PrintWriter.println(PrintWriter.java:683)
>
>        at java.lang.Throwable.printStackTrace(Throwable.java:510)
>
>        at org.apache.tools.ant.util.StringUtils.getStackTrace(
> StringUtils.java:96)
>
>        at
>
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.getFilteredTrace
> (JUnitTestRunner.java:856)
>
>        at
>
> org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.formatError
> (XMLJUnitResultFormatter.java:280)
>
>        at
>
> org.apache.tools.ant.taskdefs.optional.junit.XMLJUnitResultFormatter.addError
> (XMLJUnitResultFormatter.java:255)
>
>        at
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner$4.addError(
> JUnitTestRunner.java:988)
>
>        at junit.framework.TestResult.addError(TestResult.java:38)
>
>        at junit.framework.JUnit4TestAdapterCache$1.testFailure(
> JUnit4TestAdapterCache.java:51)
>
>        at org.junit.runner.notification.RunNotifier$4.notifyListener(
> RunNotifier.java:96)
>
>        at org.junit.runner.notification.RunNotifier$SafeNotifier.run(
> RunNotifier.java:37)
>
>        at org.junit.runner.notification.RunNotifier.fireTestFailure(
> RunNotifier.java:93)
>
>        at org.junit.internal.runners.TestMethodRunner.addFailure(
> TestMethodRunner.java:104)
>
>        at org.junit.internal.runners.TestMethodRunner.runUnprotected(
> TestMethodRunner.java:87)
>
>        at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
> BeforeAndAfterRunner.java:34)
>
>        at org.junit.internal.runners.TestMethodRunner.runMethod(
> TestMethodRunner.java:75)
>
>        at org.junit.internal.runners.TestMethodRunner.run(
> TestMethodRunner.java:45)
>
>        at
> org.junit.internal.runners.TestClassMethodsRunner.invokeTestMethod(
> TestClassMethodsRunner.java:71)
>
>        at org.junit.internal.runners.TestClassMethodsRunner.run(
> TestClassMethodsRunner.java:35)
>
>        at org.junit.internal.runners.TestClassRunner$1.runUnprotected(
> TestClassRunner.java:42)
>
>        at org.junit.internal.runners.BeforeAndAfterRunner.runProtected(
> BeforeAndAfterRunner.java:34)
>
>        at org.junit.internal.runners.TestClassRunner.run(
> TestClassRunner.java:52)
>
>
> I am writing a filesystem crawler so I need to be able to crawl and index
> any size file (within reason).  A 3-4MB file is certainly within reason.
>  I
> rewrote my code to store the file contents in a file and read/write in one
> line at a time.  However, when I post the XML file to Solr using
> SimplePostTool, I get another OutOfMemoryError about the java heap space
> (thrown from org.xmlpull... again).  In any case, does anyone have any
> ideas
> about this?  Has anyone posted documents with contents larger than 3.5MBto
> Solr successfully?  If so, how was it done?  I'm using Solr v1.2.
>
>
> Best,
>
> Dave
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message