lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gilad Barkai (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3390) Incorrect sort by Numeric (double) values for documents missing the sorting field
Date Mon, 22 Aug 2011 11:37:29 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gilad Barkai updated LUCENE-3390:
---------------------------------

    Description: 
While sorting results over a numeric double field, documents which do not contain a value
for the sorting field seem to get 0 (ZERO) value in the sort. 
This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution
would either be allowing the user to define such a "non-value" default, or always bring those
document results as the last ones.

Example scenario:
Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
Searching with MatchAllDocsQuery, with sort over that field in descending order yields the
docid results of 0, 2, 1.

While the document with the missing value does match the query, I would expect it to come
last, as it is not comparable by the other documents. For example, asking for the top 2 documents
brings the document without any value which seems as a bug?

  was:
While sorting results over a numeric double field, documents which do not contain a value
for the sorting field seem to get 0 (ZERO) value in the sort. 
This behavior is unexpected, as zero is "comparable" to the rest of the values. A better solution
would either be allowing the user to define such a "non-value" default, or always bring those
document results as the last ones.

Example scenario:
Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
Searching with MatchAllDocsQuery, with sort over that field in descending order yields the
docid results of 0, 2, 1.

Example code:
public static void main(String[] args) throws Exception {
	RAMDirectory d = new RAMDirectory();
	IndexWriter w = new IndexWriter(d, new IndexWriterConfig(Version.LUCENE_33, new KeywordAnalyzer()));
	
	// 1st doc, value 3.5d
	Document doc = new Document();
	doc.add(new NumericField("f", Store.YES, true).setDoubleValue(3.5d));
	w.addDocument(doc);
	
	// 2nd doc, value of -10d
	doc = new Document();
	doc.add(new NumericField("f", Store.YES, true).setDoubleValue(-10d));
	w.addDocument(doc);
	
	// 3rd doc, no value at all
	w.addDocument(new Document());
	w.close();

	IndexSearcher s = new IndexSearcher(d);
	Sort sort = new Sort(new SortField("f", SortField.DOUBLE, true));
	TopDocs td = s.search(new MatchAllDocsQuery(), 10, sort);
	for (ScoreDoc sd : td.scoreDocs) {
		System.out.println(sd.doc + ": " + s.doc(sd.doc).get("f"));
	}
	s.close();
	d.close();
}
 


> Incorrect sort by Numeric (double) values for documents missing the sorting field
> ---------------------------------------------------------------------------------
>
>                 Key: LUCENE-3390
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3390
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.3
>            Reporter: Gilad Barkai
>            Priority: Minor
>              Labels: double, numeric, sort
>         Attachments: SortByDouble.java
>
>
> While sorting results over a numeric double field, documents which do not contain a value
for the sorting field seem to get 0 (ZERO) value in the sort. 
> This behavior is unexpected, as zero is "comparable" to the rest of the values. A better
solution would either be allowing the user to define such a "non-value" default, or always
bring those document results as the last ones.
> Example scenario:
> Adding 3 documents, 1st with value 3.5d, 2nd with -10d, and 3rd without any value.
> Searching with MatchAllDocsQuery, with sort over that field in descending order yields
the docid results of 0, 2, 1.
> While the document with the missing value does match the query, I would expect it to
come last, as it is not comparable by the other documents. For example, asking for the top
2 documents brings the document without any value which seems as a bug?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message