lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <ysee...@gmail.com>
Subject Re: Loading 5gb index to RAMDirectory
Date Tue, 23 May 2006 14:28:27 GMT
Hi Michael,

The java-commits mailing list is not for posting to.
Bug reports or fixes normally get put in a JIRA.

I do think this is a good limitation to fix.
Going from int to long only costs a single cycle, and that's only on
buffer refills (i.e. negligible).

There are other places in RAMInputStream & RAMOutputStream that need
fixing too.  I'll handle that.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server


On 5/23/06, Michael Chan <dayzman@gmail.com> wrote:
> Hi,
>
> I have a 5gb index at hand, stored on disk. I tried creating a
> RAMDirectory out of it and it crashes everytime at around the 2gb
> mark. I simply create it using:
>
> RAMDirectory ramDir = new RAMDirectory("index");
>
> where "index" is the path. The error messages are as follows:
>
> "bash-2.03$ Exception in thread "main" java.lang.ExceptionInInitializerError
>        at TaxonomyFinder.RelatedCatsFinder.<init>(RelatedCatsFinder.java:46)
>        at wikipedia.WikipediaAnalyser$ExtractAbstractHandler.endElement(WikipediaAnalyser.java:295)
>        at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown
> Source)
>        at org.apache.xerces.impl.XMLNSDocumentScannerImpl.scanEndElement(Unknown
> Source)
>        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
> Source)
>        at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
> Source)
>        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>        at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
>        at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
>        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>        at wikipedia.WikipediaAnalyser.parseAbstracts(WikipediaAnalyser.java:184)
>        at wikipedia.WikipediaAnalyser.getRelatedCategories(WikipediaAnalyser.java:127)
>        at TaxonomyFinder.TaxonomyTreeMaker.main(TaxonomyTreeMaker.java:492)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: -2097152
>        at java.util.Vector.elementAt(Unknown Source)
>        at org.apache.lucene.store.RAMOutputStream.flushBuffer(RAMOutputStream.java:82)
>        at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:84)
>        at org.apache.lucene.store.BufferedIndexOutput.writeBytes(BufferedIndexOutput.java:52)
>        at org.apache.lucene.store.RAMDirectory.<init>(RAMDirectory.java:68)
>        at org.apache.lucene.store.RAMDirectory.<init>(RAMDirectory.java:95)
>        at word_coocurrence.WordCooccurrenceFinder.<clinit>(WordCooccurrenceFinder.java:50)
>        ... 13 more"
>
> I fixed it by simply changing RAMOutputStream.pointer to long, and
> Line 72 and 73 of RAMOutputStream.java to:
>
> int bufferNumber = (int) (pointer/BUFFER_SIZE);
> int bufferOffset = (int) (pointer%BUFFER_SIZE);
>
> Now, it all works fine. Maybe this is worth fixing.
>
> Michael
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message