lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: OutOfMemoryError when using Sort
Date Thu, 12 Nov 2009 16:40:48 GMT
To sort on the count field must be indexed (but not tokenized), it does not
need to be stored. But In any case, sort needs lots of memory. How many
documents do you have?

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Nuno Seco [mailto:nseco@dei.uc.pt]
> Sent: Thursday, November 12, 2009 4:58 PM
> To: java-user@lucene.apache.org
> Subject: OutOfMemoryError when using Sort
> 
> Hello List.
> 
> I'm having a problem when I add a Sort object to my searcher:
>     docs = searcher.search(parser.parse(search), null, 50, sort);
> 
> Every time I execute a query I get an OutOfMemoryError exception.
> But if I execute the query without the Sort object it works fine
> 
> Let me briefly explain how my index is structured.
> I'm indexing the Google 5Grams
> (http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-
> you.html).
> 
> The index just has two fields:
>     data = new Field("data", tokens[0], Field.Store.YES,
> Field.Index.ANALYZED, Field.TermVector.NO);
>     count = new Field("count", tokens[1], Field.Store.YES,
> Field.Index.NO, Field.TermVector.NO);
> 
> the data corresponds to the 5 gram; e.g.: "my business manager informed
> me"
> and the count is simply an integer that represents the frequency of the
> ngram.
> 
> The index size after optimization is 63G.
> 
> If I do not store the data field using:
>     data = new Field("data", tokens[0], Field.Store.NO,
> Field.Index.ANALYZED, Field.TermVector.NO);
> the total size drops to 32G
> 
> 
> But using either index with the Sort object causes the exception. I'm
> creating the Sort object like:
>     Sort sort = new Sort(new SortField("count", SortField.INT));
> 
> Note: That even with out using the Sort object I still need to pump the
> jvm to 2G (-Xmx2048m). But thats ok...
> 
> 
> So.... Basically what I want is to order those first 50 hits I get
> according to their frequency counts (count field).
> 
> 
> I'm using:
> java version "1.6.0_16" (64 bit)
> lucene 2.9.1
> linux ext3 FS
> linux kernel 2.6.31-15
> 
> Can anybody help me or redirect me in the right direction?
> 
> Thanks
> 
> --
> Nuno Seco
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message