lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2369) Locale-based sort by field with low memory overhead
Date Tue, 31 Aug 2010 15:09:54 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904651#action_12904651
] 

Robert Muir commented on LUCENE-2369:
-------------------------------------

bq. No tests with 100M documents yet, but 1½ hour for build and 1.5GB of RAM would be the
expected requirement.

Toke, have you tried doing this 'build' at index time instead? I would recommend applying
LUCENE-2551 and indexing with ICU Collation, strength=primary

Now that we can mostly do everything as bytes, I think this slow functionality to do collation/range
query at 'runtime' might soon be on its way out of lucene (see patches on LUCENE-2514).

Instead, I think its better to encourage users to index their content accordingly for the
use cases they need.


> Locale-based sort by field with low memory overhead
> ---------------------------------------------------
>
>                 Key: LUCENE-2369
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2369
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Toke Eskildsen
>            Priority: Minor
>
> The current implementation of locale-based sort in Lucene uses the FieldCache which keeps
all sort terms in memory. Beside the huge memory overhead, searching requires comparison of
terms with collator.compare every time, making searches with millions of hits fairly expensive.
> This proposed alternative implementation is to create a packed list of pre-sorted ordinals
for the sort terms and a map from document-IDs to entries in the sorted ordinals list. This
results in very low memory overhead and faster sorted searches, at the cost of increased startup-time.
As the ordinals can be resolved to terms after the sorting has been performed, this approach
supports fillFields=true.
> This issue is related to https://issues.apache.org/jira/browse/LUCENE-2335 which contain
previous discussions on the subject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message