lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9166) Export handler returns zero for numeric fields that are not in the original doc
Date Thu, 10 Nov 2016 20:10:58 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655010#comment-15655010
] 

Erick Erickson commented on SOLR-9166:
--------------------------------------

Well, certainly not the final patch for 6x. Siiiggggh.

[~mikemccand][~jpountz][~rcmuir] (and others). Just to check what I think I'm seeing...

In trunk, ExportWriter has a bunch of clauses like:
`   NumericDocValues vals = DocValues.getNumeric(reader, this.field);
    if (vals.advance(docId) == docId) {
       val = vals.longValue();
     } else {
       val = 0;
     }
    ew.put(field, val); 
'
but 6x just looks like:
`     NumericDocValues vals = DocValues.getNumeric(reader, this.field);
     long val = vals.get(docId);
      ew.put(field, val);
`

and vals.get(docId) returns zero when a docValues field isn't in the document. 

So my question is: "Would you agree that returning nothing rather than zero for docValues
fields that have no entry for a particular doc would require a lot of work in 6x?"

I know there was a whole long discussion about this on the LUCENE Jira list some time ago
but the resolution kind of escapes me and the patch is huge.

Thanks.

> Export handler returns zero for numeric fields that are not in the original doc
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-9166
>                 URL: https://issues.apache.org/jira/browse/SOLR-9166
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Erick Erickson
>            Assignee: Rohit
>         Attachments: SOLR-9166.patch, SOLR-9166.patch, SOLR-9166.patch, SOLR-9166.patch
>
>
> From the dev list discussion:
> My original post.
> Zero is different from not
> existing. And let's claim that I want to process a stream and, say,
> facet on in integer field over the result set. There's no way on the
> client side to distinguish between a document that has a zero in the
> field and one that didn't have the field in the first place so I'll
> over-count the zero bucket.
> From Dennis Gove:
> Is this true for non-numeric fields as well? I agree that this seems like a very bad
thing.
> I can't imagine that a fix would cause a problem with Streaming Expressions, ParallelSQL,
or other given that the /select handler is not returning 0 for these missing fields (the /select
handler is the default handler for the Streaming API so if nulls were a problem I imagine
we'd have already seen it). 
> That said, within Streaming Expressions there is a select(...) function which supports
a replace(...) operation which allows you to replace one value (or null) with some other value.
If a 0 were necessary one could use a select(...) to replace null with 0 using an expression
like this 
>    select(<stream>, replace(fieldA, null, withValue=0)). 
> The end result of that would be that the field fieldA would never have a null value and
for all tuples where a null value existed it would be replaced with 0.
> Details on the select function can be found at https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61330338#StreamingExpressions-select.
> And to answer Denis' question, null gets returned for string DocValues fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message