Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7DE6D1137A for ; Fri, 27 Jun 2014 08:05:25 +0000 (UTC) Received: (qmail 10935 invoked by uid 500); 27 Jun 2014 08:05:25 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 10852 invoked by uid 500); 27 Jun 2014 08:05:25 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 10616 invoked by uid 99); 27 Jun 2014 08:05:25 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Jun 2014 08:05:25 +0000 Date: Fri, 27 Jun 2014 08:05:25 +0000 (UTC) From: "Robert Stupp (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-7395) Support for pure user-defined functions (UDF) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CASSANDRA-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045690#comment-14045690 ] Robert Stupp commented on CASSANDRA-7395: ----------------------------------------- Some questions: * Type parsing in C* is programmatically only possible from _String_ to _AbstractType_. Parsing of CQL3 types is done by _Cql.q_, which "constructs" AbstractType. Is it ok to limit type names to the _AbstractType_ syntax? Although I've added some simple "CQL3 parsing" using a CQL3Types.Native.valueOf() * Shall UDFs support list/set/map/udf/tuple types - even nested types? It makes the current approach of using Java types in UDFs somewhat complicated. An intermediate solution might be to just pass the ByteBuffer - but that would not be consistent. Using list/set/map with _primitive_ types is not a big deal. I think that these "high level" types are a bit "out of scope" of pure UDFs. * Passing "any" type to a UDF (UDF gets a _TypeAndData_ class instance that contains the AbstractType + ByteBuffer) would require to change the {{Function.execute(List))}} signature. Is this a feature worth that change? I'm a bit skeptical about the benefit of this _feature_. * Is the approach to load UDF bundles (jar files) using a tool into C* {{system_udf}} keyspace ok? * If it's ok, then I'd add some "byte code scanner" that prevents loading of "evil" code (usage of classes like Thread, Runtime, ProcessBuilder, etc). By default such bundles would be rejected - but the user could override with a command line switch. I could go on and write some unit tests for UDFs. Forgot to mention that the CQL syntax for UDFs in the second version is: {{ '::' '(' ')' }} (Senseless) examples: {noformat} cqlsh> select id, num, demo::sin(demo::cos(num)) from foo.demo; id | num | demo__sin_demo__cos_num ----+-----+------------------------- 1 | 1 | 0.5144 cqlsh> select id, num, demo::sin(demo::random()) from foo.demo; id | num | demo__sin_demo__random ----+-----+------------------------ 1 | 1 | 0.13712 (1 rows) {noformat} UDFs with two or more arguments (e.g. min(a,b), max(a,b)) naturally work. The current status (not changed heavily from the second patch) is in [github|https://github.com/snazy/cassandra/tree/7395] > Support for pure user-defined functions (UDF) > --------------------------------------------- > > Key: CASSANDRA-7395 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7395 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Jonathan Ellis > Fix For: 3.0 > > Attachments: 7395-v2.diff, 7395.diff > > > We have some tickets for various aspects of UDF (CASSANDRA-4914, CASSANDRA-5970, CASSANDRA-4998) but they all suffer from various degrees of ocean-boiling. > Let's start with something simple: allowing pure user-defined functions in the SELECT clause of a CQL query. That's it. > By "pure" I mean, must depend only on the input parameters. No side effects. No exposure to C* internals. Column values in, result out. http://en.wikipedia.org/wiki/Pure_function -- This message was sent by Atlassian JIRA (v6.2#6252)