lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2335) optimization: when sorting by field, if index has one segment and field values are not needed, do not load String[] into field cache
Date Sat, 20 Mar 2010 08:05:27 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847744#action_12847744
] 

Michael McCandless commented on LUCENE-2335:
--------------------------------------------

bq.  If this is true, there is no need for stored fields: We can get the Strings from the
indexed fields instead, as we keep track of the ordinals for the Terms.

Meaning, after we're done collecting hits for this segment, you'd make a 2nd pass to resolve
the ord -> value for all docs that made the cut?  This may be slowish?  Or would you somehow
try to do it only at the end, ie for only docs that made the cut across all segments?

We'd probably want to change the API, somehow, bulk-load the ords, so that we'd make single
forward sweep (ie, visit the ords in order).

> optimization: when sorting by field, if index has one segment and field values are not
needed, do not load String[] into field cache
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2335
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2335
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>
> Spinoff from java-dev thread "Sorting with little memory: A suggestion", started by Toke
Eskildsen.
> When sorting by SortField.STRING we currently ask FieldCache for a StringIndex on that
field.
> This can consumes tons of RAM, when the values are mostly unique (eg a title field),
as it populates both int[] ords as well as String[] values.
> But, if the index is only one segment, and the search sets fillFields=false, we don't
need the String[] values, just the int[] ords.  If the app needs to show the fields it can
pull them (for the 1 page) from stored fields.
> This can be a potent optimization -- alot of RAM saved -- for optimized indexes.
> When fixing this we must take care to share the int[] ords if some queries do fillFields=true
and some =false... ie, FieldCache will be called twice and it should share the int[] ords
across those invocations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message