lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-9166) Export handler returns zero for numeric fields that are not in the original doc
Date Mon, 31 Oct 2016 03:52:58 GMT

     [ https://issues.apache.org/jira/browse/SOLR-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Erick Erickson updated SOLR-9166:
---------------------------------
    Attachment: SOLR-9166.patch

[~rohitcse] I had some time today so started this patch.

What I have so far. I got it this far and ran into a few things I thought I'd run by folks.
Lots of nocommits and the like currently, as well as new failing tests. But it's progress....

[~yonik@apache.org] [~joel.bernstein] [~dpgove] I'd be particularly interested in your takes.

1> My base assumption is that sorting during export should return docs in the same order
as using the /select handler. Currently this doesn't happen, the new test I wrote fails all
over the place. Not quite sure why, but I just got all this to semi-work so I'm checkpointing.

2> I want to fold the two parameters into a single on/off returnDefaultsForMissing which
defaults to "false". This would mean there's really no way to get the old behavior where numerics
return zero and strings return null. Is that OK? I think it's easier to explain something
like "defaults for numerics are zero, default for string is "", default for boolean is "false"
and default for date is in 1970". But see <4>.

3> Does it make any sense to support sortMissingFirst/Last? My initial take is "no" since
what matters is consistent sorting. That said I started down that road before wondering whether
it was desirable so this patch has sortMissingFirstLast in the test, it'll be removed unless
there are objections.

4> [~yonik@apache.org]: Your comment about using functions is interesting. I'll take a
look at that now that I have a clue what the problem is. It's certainly more elegant than
some new flag I think and allows the user to put anything at all in rather than us deciding
what a "proper" default is. Do you have any advice on how to access the defined default for
the fields in SortingResponseWriter since that's where I need to trap this? (being lazy here).

5> I @Ignored all the rest of the tests except the new one to be able to beast the new
stuff, they'll be un-ignored before committing.

6> Despite my comment on the dev list, after looking into this I don't think we want to
force it into 6.3, I think there'll be some ramifications we'll need to bake out.

No doubt more later when we get some advice on how to continue.

> Export handler returns zero for numeric fields that are not in the original doc
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-9166
>                 URL: https://issues.apache.org/jira/browse/SOLR-9166
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Erick Erickson
>            Assignee: Rohit
>         Attachments: SOLR-9166.patch, SOLR-9166.patch
>
>
> From the dev list discussion:
> My original post.
> Zero is different from not
> existing. And let's claim that I want to process a stream and, say,
> facet on in integer field over the result set. There's no way on the
> client side to distinguish between a document that has a zero in the
> field and one that didn't have the field in the first place so I'll
> over-count the zero bucket.
> From Dennis Gove:
> Is this true for non-numeric fields as well? I agree that this seems like a very bad
thing.
> I can't imagine that a fix would cause a problem with Streaming Expressions, ParallelSQL,
or other given that the /select handler is not returning 0 for these missing fields (the /select
handler is the default handler for the Streaming API so if nulls were a problem I imagine
we'd have already seen it). 
> That said, within Streaming Expressions there is a select(...) function which supports
a replace(...) operation which allows you to replace one value (or null) with some other value.
If a 0 were necessary one could use a select(...) to replace null with 0 using an expression
like this 
>    select(<stream>, replace(fieldA, null, withValue=0)). 
> The end result of that would be that the field fieldA would never have a null value and
for all tuples where a null value existed it would be replaced with 0.
> Details on the select function can be found at https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61330338#StreamingExpressions-select.
> And to answer Denis' question, null gets returned for string DocValues fields.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message