lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8878) Provide alternative sorting utility from SortField other than FieldComparator
Date Thu, 27 Jun 2019 18:54:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874425#comment-16874425
] 

Robert Muir commented on LUCENE-8878:
-------------------------------------

Yes, please don't let me discourage you from attempting to simplify the API.

I just wanted to point out that for a search engine, there are totally valid use-cases for
the sort comparator to exploit the priority queue to go faster. I think the distance one is
"reasonable" in that sense.

The comparison-by-ordinal stuff we do for strings is more extreme, it is kind of a separate
issue from that? Its related, There might be other ways to do it and still have good performance.
I know there was a lot of investigation and benchmarking in past JIRA issues on that.

> Provide alternative sorting utility from SortField other than FieldComparator
> -----------------------------------------------------------------------------
>
>                 Key: LUCENE-8878
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8878
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 8.1.1
>            Reporter: Tony Xu
>            Priority: Major
>
> The `FieldComparator` has many responsibilities and users get all of them at once. At
high level the main functionalities of `FieldComparator` are
>  * Provide LeafFieldComparator
>  * Allocate storage for requested number of hits
>  * Read the values from DocValues/Custom source etc.
>  * Compare two values
> There are two major areas for improvement
>  # The logic of reading values and storing them are coupled.
>  # User need to specify the size in order to create a `FieldComparator` but sometimes
the size is unknown upfront.
>  # From `FieldComparator`'s API, one can't reason about thread-safety so it is not suitable
for concurrent search.
>  E.g. Can two concurrent thread use the same `FieldComparator` to call `getLeafComparator`
for two different segments they are working on? In fact, almost all existing implementations
of `FieldComparator` are not thread-safe.
> The proposal is to enhance `SortField` with two APIs
>  # {color:#14892c}int compare(Object v1, Object v2){color} – this is to compare two
values from different docs for this field
>  # {color:#14892c}ValueAccessor newValueAccessor(LeafReaderContext leaf){color} – This
encapsulate the logic for obtaining the right implementation in order to read the field values.
>  `ValueAccessor` should be accessed in a similar way as `DocValues` to provide the sort
value for a document in an advance & read fashion.
> With this API, hopefully we can reduce the memory usage when using `FieldComparator`
because the users either store the sort values or at least the slot number besides the storage
allocated by `FieldComparator` itself. Ideally, only once copy of the values should be stored.
> The proposed API is also more friendly to concurrent search since it provides the `ValueAccessor`
per leaf. Although same `ValueAccessor` can't be shared if there are more than one thread
working on the same leaf, at least they can initialize their own `ValueAccessor`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message