lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcus Falck" <>
Subject SV: Sort problematics
Date Thu, 18 May 2006 20:25:23 GMT
Where can i read more about the lucene sort implementation?
Does there exist any documentation on the sorting except for the Lucene API docs?


Från: Yonik Seeley []
Skickat: to 2006-05-18 20:39
Ämne: Re: Sort problematics

On 5/18/06, Marcus Falck <> wrote:
> I'm well aware of the trade offs. But if you were aware of the large amounts of data
that this system should be able to search you woldn't propose the usage of a database.

If you have a hard requirement of instantly seeing any update, you
can't use Lucene.  That's more database-like functionallity. That's
why I asked.

> Since I have an separate alert service for immediatly alerts up and running i may be
able to do trade offs with the data availability timings, and hold the indexsearcher open
for a longer period.

That's pretty much a requirement for using Lucene to support a decent
query rate.

> But still. The memory is the problem.
> I mean how much memory would the fieldcache take for 500 Millon newsletter articles?
Probably a lot,
> ok the system is scaled out over different machines so in reality each machine won't
have 500 Million docs but maybe around 100Million.

Depends on what you are sorting by... for an int/float 100M*4 or
800MB.  Big, but possible.

> So i'm still interesting in changing the relevance.
> Any ideas?

Depends on what you are sorting by, and how many different ways you
want to sort.  If it's a single sort criteria, you can use index-time
boosts.  If you can sort multiple ways, avoiding the fieldcache
probably won't help you because the time to retrieve the per-doc sort
info via termvectors or stored fields will take too long.

-Yonik Solr, the open-source Lucene search server

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message