lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Chan" <dayz...@gmail.com>
Subject Loading 5gb index to RAMDirectory
Date Tue, 23 May 2006 12:43:11 GMT
Hi,

I have a 5gb index at hand, stored on disk. I tried creating a
RAMDirectory out of it and it crashes everytime at around the 2gb
mark. I simply create it using:

RAMDirectory ramDir = new RAMDirectory("index");

where "index" is the path. The error messages are as follows:

"bash-2.03$ Exception in thread "main" java.lang.ExceptionInInitializerError
       at TaxonomyFinder.RelatedCatsFinder.<init>(RelatedCatsFinder.java:46)
       at wikipedia.WikipediaAnalyser$ExtractAbstractHandler.endElement(WikipediaAnalyser.java:295)
       at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
Source)
       at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
Source)
       at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
       at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
       at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
       at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
       at wikipedia.WikipediaAnalyser.parseAbstracts(WikipediaAnalyser.java:184)
       at wikipedia.WikipediaAnalyser.getRelatedCategories(WikipediaAnalyser.java:127)
       at TaxonomyFinder.TaxonomyTreeMaker.main(TaxonomyTreeMaker.java:492)
Caused by: java.lang.ArrayIndexOutOfBoundsException: -2097152
       at java.util.Vector.elementAt(Unknown Source)
       at org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java:82)
       at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:84)
       at org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:52)
       at org.apache.lucene.store.RAMDirectory.<init>(RAMDirectory.java:68)
       at org.apache.lucene.store.RAMDirectory.<init>(RAMDirectory.java:95)
       at word_coocurrence.WordCooccurrenceFinder.<clinit>(WordCooccurrenceFinder.java:50)
       ... 13 more"

I fixed it by simply changing RAMOutputStream.pointer to long, and
Line 72 and 73 of RAMOutputStream.java to:

int bufferNumber = (int) (pointer/BUFFER_SIZE);
int bufferOffset = (int) (pointer%BUFFER_SIZE);

Now, it all works fine. Maybe this is worth fixing.

Michael

Mime
View raw message