lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Toke Eskildsen (JIRA)" <>
Subject [jira] Commented: (LUCENE-2369) Locale-based sort by field with low memory overhead
Date Wed, 01 Sep 2010 13:33:54 GMT


Toke Eskildsen commented on LUCENE-2369:

ICU keys are just byte[] just like regular terms. they are "regular terms"

Do they or do they not need to be loaded into heap in order to be used for sorted search?

Can we forget about the stupid runtime Locale sort, if you have a way to improve memory usage
for byte[] terms, lets look just at that? Then this could be more general and more useful.

Easy now. The whole runtime-vs-index-time issue is something that I don't care much about
at this point. Pre-sorting can be done both at index and search time. Let's just say that
we do it at index-time and go from there.

Not holding the sort-terms in memory (whether they be Strings, BytesRefs, regular terms or
ICU keys) and doing all possible sorting up front (in the case of a hybrid ICU-approach: A
merge-sort of the already sorted segments), is what I'm looking at. Could you please re-read
my comment with that in mind and see if my breakdown and trade-off lists makes sense? It seems
to me that you're quite certain that there is something I've missed, but I haven't yet understood
what it is. I do know that ICU keys are just regular terms in the technical sense. When I
use the designation ICU keys, I do it to make it clear that we're getting locale-specific

Deep breaths, ok? I'm going to fetch the kids from school, so you don't need to rush your

> Locale-based sort by field with low memory overhead
> ---------------------------------------------------
>                 Key: LUCENE-2369
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Toke Eskildsen
>            Priority: Minor
> The current implementation of locale-based sort in Lucene uses the FieldCache which keeps
all sort terms in memory. Beside the huge memory overhead, searching requires comparison of
terms with every time, making searches with millions of hits fairly expensive.
> This proposed alternative implementation is to create a packed list of pre-sorted ordinals
for the sort terms and a map from document-IDs to entries in the sorted ordinals list. This
results in very low memory overhead and faster sorted searches, at the cost of increased startup-time.
As the ordinals can be resolved to terms after the sorting has been performed, this approach
supports fillFields=true.
> This issue is related to which contain
previous discussions on the subject.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message