lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramana Jelda" <ramana.je...@ciao-group.com>
Subject RE: OutOfMemoryError while enumerating through reader.terms(fieldName)
Date Tue, 02 May 2006 14:57:48 GMT
I just got an idea for category counting instead following this BitSet
approach..


I will maintain and array with  docIds to cateogy_ids as value.

i.e. documents[docId] =category_id

Which is taking for 1 million docs,around each docid=4
bytes,category_id=4bytes = 8MBytes 

And then from user query, using HitCollector docids, I will try to calculate
each category count.. I think it is self understandable.. 

What do u think??
Any advice is really welcome..

Note: Actually, I have around 20000 unique cateogry ids.. 

Thx,
Jelda

> -----Original Message-----
> From: Ramana Jelda [mailto:ramana.jelda@ciao-group.com] 
> Sent: Tuesday, May 02, 2006 4:41 PM
> To: java-user@lucene.apache.org
> Subject: RE: OutOfMemoryError while enumerating through 
> reader.terms(fieldName)
> 
> I am trying to implement category count almost similar to 
> CNET approach.
> At the initialization time , I am trying to create all these 
> BitSets and then trying to and them with user query(with a 
> bitset obtained from queryfilter containing user query)..
> 
> This way my application is performant..Don't u think so? 
> Actually I need all those bitsets everytime user queries. I 
> can not use exisiting Lucene filter approach.. Is n't it??
> 
> Thx in advance,
> Jelda
> 
> 
> 
> 
> > -----Original Message-----
> > From: mark harwood [mailto:markharw00d@yahoo.co.uk]
> > Sent: Tuesday, May 02, 2006 4:19 PM
> > To: java-user@lucene.apache.org
> > Subject: RE: OutOfMemoryError while enumerating through
> > reader.terms(fieldName)
> > 
> > >>Any advise is relly welcome.
> > 
> > Don't cache all that data.
> > You need a minimum of (numUniqueTerms*numDocs)/8 bytes to hold that 
> > info.
> > Assuming 10,000 unique terms and 1 million docs you'd need 
> over 1 Gig 
> > of RAM.
> > 
> > I suppose the question is what are you trying to achieve 
> and why can't 
> > you use the existing Lucene APIs instead of caching all 
> those bitsets?
> > 
> > Cheers
> > Mark
> > 
> > 
> > 		
> > ___________________________________________________________
> > Switch an email account to Yahoo! Mail, you could win FIFA 
> World Cup 
> > tickets. http://uk.mail.yahoo.com
> > 
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message