lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Toke Eskildsen (JIRA)" <>
Subject [jira] Commented: (LUCENE-2335) optimization: when sorting by field, if index has one segment and field values are not needed, do not load String[] into field cache
Date Tue, 23 Mar 2010 07:12:27 GMT


Toke Eskildsen commented on LUCENE-2335:

I can see that I messed up reading your previous answer, regarding stored fields. Let's just
forget is as to not confuse the issue further.

As for facets, they are equivalent to sorting in the aspect that resolving the actual Strings
can be delayed until the very end. I'll try and contain myself on the facet subject and focus
on sorting though.

I have used some time tinkering with the problem of spanning multiple segments and it seems
to me that the generation of a "global" list of sorted ordinals should be feasible without
too much overhead. Basically we want to preserve sequential access as much as possible, so
merging sorted ordinals from segments will benefit from a read-ahead cache. By letting the
reader deliver ordinals by an iterater, it is free to implement such a cache when necessary.
I envision the signature to be something like
Iterator<OrdinalTerm> getOrdinalTerms(
      String persistenceKey, Comparator<Object> comparator, String field,
      boolean collectDocIDs) throws IOException;
where OrdinalTerm contains ordinal, Term and docID.

The beauty of all this is that the mapping is from docID->sortedOrdinal index (which it
has to be for fast comparison), so keeping the possibility of resolving the Strings after
the sort (fillFields=true) is free in terms of storage space and processing time.

I hope to have a patch out soon for SegmentReader so that it is possible to perform a sorted
search "the Lucene way" rather than the hack I use in my proof of concept. However, vacation
starts friday...

> optimization: when sorting by field, if index has one segment and field values are not
needed, do not load String[] into field cache
> ------------------------------------------------------------------------------------------------------------------------------------
>                 Key: LUCENE-2335
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
> Spinoff from java-dev thread "Sorting with little memory: A suggestion", started by Toke
> When sorting by SortField.STRING we currently ask FieldCache for a StringIndex on that
> This can consumes tons of RAM, when the values are mostly unique (eg a title field),
as it populates both int[] ords as well as String[] values.
> But, if the index is only one segment, and the search sets fillFields=false, we don't
need the String[] values, just the int[] ords.  If the app needs to show the fields it can
pull them (for the 1 page) from stored fields.
> This can be a potent optimization -- alot of RAM saved -- for optimized indexes.
> When fixing this we must take care to share the int[] ords if some queries do fillFields=true
and some =false... ie, FieldCache will be called twice and it should share the int[] ords
across those invocations.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message