lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tsuraan <tsur...@gmail.com>
Subject huge tii files
Date Tue, 17 Jun 2008 18:09:15 GMT
I have a collection of indices with a total of about 7,000,000
documents between them all.  When I attempt to run a search over these
indices, the searching process's memory usage increases to ~1.7GB if I
allow java to use that much memory.  If I don't (my normal memory cap
is 512MB), I get the following exception:

Exception in thread "Thread-2" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOfRange(Arrays.java:3209)
        at java.lang.String.<init>(String.java:216)
        at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java:104)
        at org.apache.lucene.index.SegmentTermEnum.term(SegmentTermEnum.java:159)
        at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:119)
        at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:157)
        at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:419)
        at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:87)
        at org.apache.lucene.search.Searcher.docFreqs(Searcher.java:178)
        at org.apache.lucene.search.MultiSearcher.createWeight(MultiSearcher.java:311)
        at org.apache.lucene.search.Searcher.search(Searcher.java:118)
        at org.apache.lucene.search.Searcher.search(Searcher.java:97)
        at SearchThread.run(SearchThread.java:54)

So, it looks like simply attempting to read the .tii files from the
indices is taking huge amounts of RAM.  This is only happening on one
machine; other machines with similar data run just with 256-512MB
memory restrictions, so I'm trying to figure out what could cause the
.tii files to become so bloated.  Is there anything I can do to fix
these indices?  Searching is also very slow on this machine; many
machines with tens of millions of documents can do searches with
subsecond responses, whereas this machine takes many seconds to call
its HitCollector's collect function for the first time.

Any suggestions about how to slim down the .tii files on this machine
(or any workarounds) would be much appreciated.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message