lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jian chen" <>
Subject Re: Large scale sorting
Date Thu, 12 Apr 2007 00:17:24 GMT
I agree. this falls into the area where technical limit is reached. Time to
modify the spec.

I thought about this issue over this couple of days, there is really NO
silver bullet. If the field is multi-value field and the distinct field
values are not too many, you might reduce memory usage by storing the field
as bitset. Each bit corresponding to a distinct value.

But either way, you have to load the whole thing into memory for good


On 4/10/07, Chris Hostetter <> wrote:
> : I'm wondering then if the Sorting infrastructure could be refactored
> : to allow  with some sort of policy/strategy where one can choose a
> : point where one is not willing to use memory for sorting, but willing
>         ...
> : To accomplish this would require a substantial change to the
> : FieldSortHitQueue et al, and I realize that the use of NIO
> I don't follow ... why could this be implemented entirely via a new
> SortComparatorSource?  (you would also need something to create your file,
> but that could probably be done as a decorator or subclass of IndexWRiter
> couldn't it?)
> : immediately pins Lucene to Java 1.4, so I'm sure this is
> : controversial.  But, if we wish Lucene to go beyond where it is now,
> Java 1.5 is controversial, Lucene already has 1.4 dependencies.
> : I think we need to start thinking about this particular problem
> : sooner rather than later.
> it depends on your timeline, Lucene's gotten pretty far with what it's
> got.  Personally i'm banking on RAM getting cheaper fast enough that I
> won't ever need to worry about this.
> If i needed to support sorting on lots of fields with lots of differnet
> locales, and my index was big enough that i couldn't feasibly keep all of
> the FieldCaches in memory on one box, i wouldn't partition the index
> across multiple boxes and merge results with a MultiSearcher ... i'd clone
> the index across multiple boxes and partition the traffic based on the
> field/locale it's searching on.
> it's a question of cache management, if i know i have two very differnet
> use cases for a Solr index, i partition those use case to seperate tiers
> of machines to get better cache utilization, FieldCache is
> just another type of cache.
> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message