cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8374) Better support of null for UDF
Date Tue, 23 Dec 2014 20:26:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14257470#comment-14257470
] 

Tyler Hobbs commented on CASSANDRA-8374:
----------------------------------------

bq. If you admit that throwing an error means the function is broken then surely that means
people should unbroke their function by returning null when they realize it is broken.

I think you may be overlooking the option of doing something other than returning null or
raising an error when a null argument is passed.  There are definitely cases where it can
make sense to, say, return some default value on null.

bq. Can you give me a few example of functions that might make sense to add to our hardcoded
functions and for which throwing an exception on null would be reasonable, knowing that it
would basically mean the function can't be used in select clauses?

Again, I'm not saying that throwing an exception on null is the best behavior.  I think it's
a good way to _alert_ users to the fact that their function is _broken_.  Fixing the function
does _not_ necessarily mean changing it to return null on null input.

An example function off the top of my head: say you're calculating a ratio of two columns.
 It can make sense to return 0 instead of null when one of the two columns is null.  I admit
that returning null could be okay, too, I just have a preference for making the user explicitly
aware of that behavior.  It sounds like you disagree, which is okay.  I'm still -1 making
{{RETURNS NULL ON NULL INPUT}} the default, but you have reasonable arguments against that,
so I'll leave it up to you.

bq. I think the proper default behavior for aggregate function is to ignore rows that have
nulls.

Agreed.

bq. my argument is what you say: RETURNS implies that NULL is used as the return value, which
is just not true because the state isn't updated then. Other databases generally ignore any
NULL input for aggregates (e.g. Oracle documents that explicitly) - so that could be the way
to go: never call a state function for any NULL argument (and leaving the syntax as proposed).

I think that's confusing the behavior of functions and aggregates too much.  The {{RETURNS
NULL ON NULL}} option definitley makes sense for functions.  The behavior of aggregates for
such functions is separate (and I agree with Sylvain that Postgres' aggregate behavior for
strict/RNON makes sense).

> Better support of null for UDF
> ------------------------------
>
>                 Key: CASSANDRA-8374
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8374
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Robert Stupp
>             Fix For: 3.0
>
>         Attachments: 8473-1.txt, 8473-2.txt
>
>
> Currently, every function needs to deal with it's argument potentially being {{null}}.
There is very many case where that's just annoying, users should be able to define a function
like:
> {noformat}
> CREATE FUNCTION addTwo(val int) RETURNS int LANGUAGE JAVA AS 'return val + 2;'
> {noformat}
> without having this crashing as soon as a column it's applied to doesn't a value for
some rows (I'll note that this definition apparently cannot be compiled currently, which should
be looked into).  
> In fact, I think that by default methods shouldn't have to care about {{null}} values:
if the value is {{null}}, we should not call the method at all and return {{null}}. There
is still methods that may explicitely want to handle {{null}} (to return a default value for
instance), so maybe we can add an {{ALLOW NULLS}} to the creation syntax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message