lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Paul_Murd...@emainc.com>
Subject RE: Indexing large files? - No answers yet...
Date Fri, 11 Sep 2009 11:57:23 GMT
This issue is still open.  Any suggestions/help with this would be
greatly appreciated.

Thanks,

Paul


-----Original Message-----
From: java-user-return-42080-Paul_Murdoch=emainc.com@lucene.apache.org
[mailto:java-user-return-42080-Paul_Murdoch=emainc.com@lucene.apache.org
] On Behalf Of Paul_Murdoch@emainc.com
Sent: Monday, August 31, 2009 10:28 AM
To: java-user@lucene.apache.org
Subject: Indexing large files?

Hi,

 

I'm working with Lucene 2.4.0 and the JVM (JDK 1.6.0_07).  I'm
consistently receiving "OutOfMemoryError: Java heap space", when trying
to index large text files.

 

Example 1: Indexing a 5 MB text file runs out of memory with a 16 MB
max. heap size.  So I increased the max. heap size to 512 MB.  This
worked for the 5 MB text file, but Lucene still used 84 MB of heap space
to do this.  Why so much?

 

The class FreqProxTermsWriterPerField appears to be the biggest memory
consumer by far according to JConsole and the TPTP Memory Profiling
plugin for Eclipse Ganymede.

 

Example 2: Indexing a 62 MB text file runs out of memory with a 512 MB
max. heap size.  Increasing the max. heap size to 1024 MB works but
Lucene uses 826 MB of heap space while performing this.  Still seems
like way too much memory is being used to do this.  I'm sure larger
files would cause the error as it seems correlative.

 

I'm on a Windows XP SP2 platform with 2 GB of RAM.  So what is the best
practice for indexing large files?  Here is a code snippet that I'm
using:

 

// Index the content of a text file.

      private Boolean saveTXTFile(File textFile, Document textDocument)
throws CIDBException {           

            

            try {             

                              

                  Boolean isFile = textFile.isFile();

                  Boolean hasTextExtension =
textFile.getName().endsWith(".txt");

                  

                  if (isFile && hasTextExtension) {

             

                        System.out.println("File " +
textFile.getCanonicalPath() + " is being indexed");

                        Reader textFileReader = new
FileReader(textFile);

                        if (textDocument == null)

                              textDocument = new Document();

                        textDocument.add(new Field("content",
textFileReader));

                        indexWriter.addDocument(textDocument);
// BREAKS HERE!!!!

                  }                    

            } catch (FileNotFoundException fnfe) {

                  System.out.println(fnfe.getMessage());

                  return false;

            } catch (CorruptIndexException cie) {

                  throw new CIDBException("The index has become
corrupt.");

            } catch (IOException ioe) {

                  System.out.println(ioe.getMessage());

                  return false;

            }                    

            return true;

      }

 

 

Thanks much,

 

Paul

 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message