cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7395) Support for pure user-defined functions (UDF)
Date Fri, 27 Jun 2014 08:05:25 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045690#comment-14045690
] 

Robert Stupp commented on CASSANDRA-7395:
-----------------------------------------

Some questions:
* Type parsing in C* is programmatically only possible from _String_ to _AbstractType_. Parsing
of CQL3 types is done by _Cql.q_, which "constructs" AbstractType. Is it ok to limit type
names to the _AbstractType_ syntax? Although I've added some simple "CQL3 parsing" using a
CQL3Types.Native.valueOf()
* Shall UDFs support list/set/map/udf/tuple types - even nested types? It makes the current
approach of using Java types in UDFs somewhat complicated. An intermediate solution might
be to just pass the ByteBuffer - but that would not be consistent. Using list/set/map with
_primitive_ types is not a big deal. I think that these "high level" types are a bit "out
of scope" of pure UDFs.
* Passing "any" type to a UDF (UDF gets a _TypeAndData_ class instance that contains the AbstractType
+ ByteBuffer) would require to change the {{Function.execute(List<ByteBuffer>))}} signature.
Is this a feature worth that change? I'm a bit skeptical about the benefit of this _feature_.
* Is the approach to load UDF bundles (jar files) using a tool into C* {{system_udf}} keyspace
ok?
* If it's ok, then I'd add some "byte code scanner" that prevents loading of "evil" code (usage
of classes like Thread, Runtime, ProcessBuilder, etc). By default such bundles would be rejected
- but the user could override with a command line switch.

I could go on and write some unit tests for UDFs.

Forgot to mention that the CQL syntax for UDFs in the second version is: {{ <bundle-name>
'::' <udf-name> '(' <parameter...> ')' }}

(Senseless) examples:
{noformat}
cqlsh> select id, num, demo::sin(demo::cos(num)) from foo.demo;
 id | num | demo__sin_demo__cos_num
----+-----+-------------------------
  1 |   1 |                  0.5144

cqlsh> select id, num, demo::sin(demo::random()) from foo.demo;
 id | num | demo__sin_demo__random
----+-----+------------------------
  1 |   1 |                0.13712
(1 rows)
{noformat}

UDFs with two or more arguments (e.g. min(a,b), max(a,b)) naturally work.

The current status (not changed heavily from the second patch) is in [github|https://github.com/snazy/cassandra/tree/7395]

> Support for pure user-defined functions (UDF)
> ---------------------------------------------
>
>                 Key: CASSANDRA-7395
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7395
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>             Fix For: 3.0
>
>         Attachments: 7395-v2.diff, 7395.diff
>
>
> We have some tickets for various aspects of UDF (CASSANDRA-4914, CASSANDRA-5970, CASSANDRA-4998)
but they all suffer from various degrees of ocean-boiling.
> Let's start with something simple: allowing pure user-defined functions in the SELECT
clause of a CQL query.  That's it.
> By "pure" I mean, must depend only on the input parameters.  No side effects.  No exposure
to C* internals.  Column values in, result out.  http://en.wikipedia.org/wiki/Pure_function



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message