lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1372) Proposal: introduce more sensible sorting when a doc has multiple values for a term
Date Fri, 05 Sep 2008 05:45:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628555#action_12628555
] 

Hoss Man commented on LUCENE-1372:
----------------------------------

bq. We'd sort the list of attributes so that it would appear as "apple,zebra".

Again i'm missing something in your argument ... you'll put code in your application which
will change the order of stored fields when displaying them so it looks better, but you won't
put code in your application to ensure that multiple values aren't indexed in the first place?

The application using Lucene is in the best position to decide "this is the value i want to
sort on." FieldCache shouldn't guess which value to use if the application breaks the rules
and indexes more then one.  the fact that FieldCache currently picks the last one is just
an artifact of how it was implemented ... it is "consistent" but "undefined" behavior.

if we are going to change the behavior we should change it should be an error.

> Proposal: introduce more sensible sorting when a doc has multiple values for a term
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1372
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1372
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.3.2
>            Reporter: Paul Cowan
>            Priority: Minor
>         Attachments: lucene-multisort.patch
>
>
> At the moment, FieldCacheImpl has somewhat disconcerting values when sorting on a field
for which multiple values exist for one document. For example, imagine a field "fruit" which
is added to a document multiple times, with the values as follows:
> doc 1: {"apple"}
> doc 2: {"banana"}
> doc 3: {"apple", "banana"}
> doc 4: {"apple", "zebra"}
> if one sorts on the field "fruit", the loop in FieldCacheImpl.stringsIndexCache.createValue()
(and similarly for the other methods in the various FieldCacheImpl caches) does the following:
>           while (termDocs.next()) {
>             retArray[termDocs.doc()] = t;
>           }
> which means that we look over the terms in their natural order and, on each one, overwrite
retArray[doc] with the value for each document with that term. Effectively, this overwriting
means that a string sort in this circumstance will sort by the LAST term lexicographically,
so the docs above will effecitvely be sorted as if they had the single values ("apple", "banana",
"banana", "zebra") which is nonintuitive. To change this to sort on the first time in the
TermEnum seems relatively trivial and low-overhead; while it's not perfect (it's not local-aware,
for example) the behaviour seems much more sensible to me. Interested to see what people think.
> Patch to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message