lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Falck" <marcus.fa...@observer.se>
Subject SV: Sort problematics
Date Thu, 18 May 2006 18:21:49 GMT
I'm well aware of the trade offs. But if you were aware of the large amounts of data that this
system should be able to search you woldn't propose the usage of a database.
 
Since I have an separate alert service for immediatly alerts up and running i may be able
to do trade offs with the data availability timings, and hold the indexsearcher open for a
longer period.
 
But still. The memory is the problem.
I mean how much memory would the fieldcache take for 500 Millon newsletter articles? Probably
a lot,
ok the system is scaled out over different machines so in reality each machine won't have
500 Million docs but maybe around 100Million.
 
So i'm still interesting in changing the relevance.
Any ideas?
 
/
Marcus

________________________________

Från: Yonik Seeley [mailto:yseeley@gmail.com]
Skickat: to 2006-05-18 17:43
Till: java-user@lucene.apache.org
Ämne: Re: Sort problematics



On 5/18/06, Marcus Falck <marcus.falck@observer.se> wrote:
> But since my "real" index will be around 2TB in size I don't think sorting is the right
way to go? I pretty sure I will have to modify the ranking.

They are both sorts, and they both use a priority queue.  The
differences shouldn't be that great after the FieldCache is populated.
 The biggest downside to the FieldCache is the memory usage, not the
CPU.

> And yes the data must be instantly available.

For each update?  If so, use a database - Lucene made different
tradeoffs in it's design.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





Mime
View raw message