cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10783) Allow literal value as parameter of UDF & UDA
Date Wed, 01 Jun 2016 18:28:59 GMT


Sylvain Lebresne commented on CASSANDRA-10783:

Actually, I think we should deal with this before dealing with CASSANDRA-7396 since the latter
basically needs terms in selections and that's what this introduce.

There is a few things that I think the patch doesn't deal properly with however:
* It sets a {{null}} name and type in the {{ColumnSpecification}} of markers.  That just doesn't
work. The only reason unit test don't blow up is that they don't actually exercise serializing
the resulting metadata but said serialization would NPE. Besides, clients need the type to
know what to send.
* The way the grammar is written is imo too restrictive. It doesn't allow collection literals
inside function calls for instance ({{f({ 'foo' : 'bar' })}} doesn't even parse (it parses
if you add a cast before the literal, but that shouldn't be required)), which feels unecesssarily
restrictive. Further, I'm generally not too fan of trying to eliminate too much through the
parser as the error message ends up being confusing imo. For instance, while {{SELECT 1 FROM
...}} or {{SELECT f\(?\) FROM ...}} is allowed, {{SELECT ? FROM ...}} would throw a parsing
exception. Don't get me wrong, {{SELECT ? FROM ...}} is something we have to reject, but I'd
rather reject it with a clear message saying that we can't infer the type and thus require
a cast.
* The fact we only prepare and bind the term in {{TermSelector.getOutput()}} is a bit silly,
as it means we'll redo said preparation/binding work (which, with functions can involve computation)
for every output row even if it's constant once bound. In general, I feel that the right place
to prepare/bind the terms is in the {{Selector.Factory.newInstance()}} method (which should
thus take the {{QueryOptions}}).

Now, fixing those points require quite a few changes and I took a stab at it in the patch
attached below. A few points worth noting on that patch:
* It makes "json" and "distinct" invalid (user) function names. It was impossible for the
parser to distinguish if {{ SELECT JSON (1, 2, 3) FROM ..}} was a {{SELECT JSON}} followed
by a tuple-literal, or a call to a user-defined "json" function with 3 ints arguments (same
for distinct). We could, alternatively, forbid literals for {{DISTINCT}} and {{JSON}}, thus
forcing the parser the recognize the query above as function calls, but if anything, that
feels like the inverse of what users would expect. This would also complicate the parser and
that doesn't feel worth it. Allowing those as valid UDF names just feels like a mistake in
the first place and I'd rather fix it now.
* It makes ColumnDefinition a Selectable (instead of ColumnIdentifier), which is both needed
by the changes so we can easily get the type of a Selectable (when possible) but also make
sense restrospectively. This does change a few error messages by having the check if a given
column exists in only one place instead of scattered around. It also switch to ByteBuffer
to handle UDT fields instead of ColumnIdentifier, mostly because the latter was not imo a
good idea.
* We do have to special case {{COUNT(1)}} for now, but I've simplified that special casing
a bit.
* It reuses the tests from Robert's patch, but with some modifications to make sure, for instance,
that the metadata for bind markers is properly set.
* There is actually 2 commits: the first one allow things like {{SELECT 1 FROM ...}}, defaulting
{{1}} to be of type {{varint}}, much like in Robert's patch.  Thinking about it though, I
had a slight change of heart and removed that behavior, making the query throw an error (saying
it can't infer the exact type). The reason is that I'm bothered with defaulting to {{varint}}
since it's partly random and not always the most convenient. So I figured, shouldn't we take
some more time to think about this and left it out initially. After all, it's not really a
big deal, {{SELECT 1 FROM...}} is not a very useful query and you can always do {{SELECT (int)1
FROM ...}} if you really want. But if people feel confident enough that defaulting to {{varint}}
is the only sensible choice and should be allowed now, then allowing it is as easy as ignoring
the 2nd commit on the branch.

|| [trunk|] || [utests|]
|| [dtests|] ||

> Allow literal value as parameter of UDF & UDA
> ---------------------------------------------
>                 Key: CASSANDRA-10783
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: DOAN DuyHai
>            Assignee: Robert Stupp
>            Priority: Minor
>              Labels: CQL3, UDF, client-impacting, doc-impacting
>             Fix For: 3.x
> I have defined the following UDF
> {code:sql}

> RETURNS int 
> LANGUAGE java 
> AS  'return Math.max(current,testValue);'
> CREATE TABLE maxValue(id int primary key, val int);
> INSERT INTO maxValue(id, val) VALUES(1, 100);
> SELECT maxOf(val, 101) FROM maxValue WHERE id=1;
> {code}
> I got the following error message:
> {code}
> SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] message="line
1:19 no viable alternative at input '101' (SELECT maxOf(val1, [101]...)">
> {code}
>  It would be nice to allow literal value as parameter of UDF and UDA too.
>  I was thinking about an use-case for an UDA groupBy() function where the end user can
*inject* at runtime a literal value to select which aggregation he want to display, something
similar to GROUP BY ... HAVING <filter clause>

This message was sent by Atlassian JIRA

View raw message