lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nuno Seco <ns...@dei.uc.pt>
Subject OutOfMemoryError when using Sort
Date Thu, 12 Nov 2009 15:57:37 GMT
Hello List.

I'm having a problem when I add a Sort object to my searcher:
    docs = searcher.search(parser.parse(search), null, 50, sort);

Every time I execute a query I get an OutOfMemoryError exception.
But if I execute the query without the Sort object it works fine

Let me briefly explain how my index is structured.
I'm indexing the Google 5Grams
(http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html). 

The index just has two fields:
    data = new Field("data", tokens[0], Field.Store.YES,
Field.Index.ANALYZED, Field.TermVector.NO);
    count = new Field("count", tokens[1], Field.Store.YES,
Field.Index.NO, Field.TermVector.NO);

the data corresponds to the 5 gram; e.g.: "my business manager informed me"
and the count is simply an integer that represents the frequency of the
ngram.

The index size after optimization is 63G.

If I do not store the data field using:
    data = new Field("data", tokens[0], Field.Store.NO,
Field.Index.ANALYZED, Field.TermVector.NO);
the total size drops to 32G


But using either index with the Sort object causes the exception. I'm
creating the Sort object like:
    Sort sort = new Sort(new SortField("count", SortField.INT));

Note: That even with out using the Sort object I still need to pump the
jvm to 2G (-Xmx2048m). But thats ok...


So.... Basically what I want is to order those first 50 hits I get
according to their frequency counts (count field).


I'm using:
java version "1.6.0_16" (64 bit)
lucene 2.9.1
linux ext3 FS
linux kernel 2.6.31-15

Can anybody help me or redirect me in the right direction?

Thanks

--
Nuno Seco

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message