cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tyler Hobbs (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-7395) Support for pure user-defined functions (UDF)
Date Fri, 01 Aug 2014 23:01:41 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083132#comment-14083132
] 

Tyler Hobbs edited comment on CASSANDRA-7395 at 8/1/14 11:01 PM:
-----------------------------------------------------------------

> Application level conflicts may arise if new 'core CQL' functions appear in a future
C* version with the same name of existing UDFs. We should make that clear in documentation
and encourage users to use namespaces. The only alternative that came to my mind was to force
users to use SELECT ::sin(foo)... for non-namespace UDFs - but I'm not happy with that alternative.

I think encouraging the use of namespaces is the best way to go.  Perhaps on startup we could
check for any non-namespaced UDFs that would be inaccessible due to a collision with a (new)
core function name and log a warning?  At that point, the user could define the function within
a namspace.

Comments on the patch:

* UFMetaData:
** Bad javadoc for the class
** Redundant work in constructors
** compatibleArgs():
*** argType should not be null (no TypeAndData)
*** To fix the IRE case, make AssignementTestable.isAssignableTo() not throw an IRE,
      add validateAssignableTo() which does throw an IRE
** parseCQLType(): want InvalidRequestException instead of ConfigurationException
** dropFunction(): use ColumnFamily.delete() to remove the whole partition instead of individual
rows
*** Do we need a way to drop a specific signature and not all versions of a function?
** adding toString() would be nice, as it's expected by the logger in a couple of places
  
* MigrationManager:
** Throw IRE instead of ConfigurationException (and update DropFunctionStatement)
* CFMetaData:
** Use 604800 for gc_grace_seconds (see CASSANDRA-7668)
* FunctionCall:
** prepare(): no need for curly braces on "if" with single statement
** prepare(): s/Unknown CQL3 function called/Unknown function called/
** toString(): no single-line if-statements, please  
* Selectable:
** toString(): no single-line if-statements
* DefsTables:
** add/update/dropFunction() do need to notify listeners (you have TODOs there), so we need
MigrationManager.notifyAdd/Update/DropUDF()
* Event:
** Keyspace string should be empty for UDF changes (we'll need a v4 protocol doc soon)
** serializeEvent() and eventSerializedSize() will need to handle function events specially
when the protocolversion < 4.  I guess we shouldn't send events at all? I'm not too sure
about this one.  
* CreateFunctionStatement:
** constructor: Instead of throwing NPEs, use assertions
** changeEvent(): whitespace is off 
** execute(): throw IREs instead of ConfigurationExceptions
* DropFunctionStatement:
** Unused imports
* UDFunction:
** "throws" can go on the same line as method signature
** Throw IRE instead of ConfigurationException  
* UDFRegistry:
** no need for curlies on single-statement "ifs" (there are a few)
** refreshInitial():
*** use QueryProcess.executeOnceInternal() instead of process()
*** put for-loop initialization on single line
*** if we fail to query system.schema_functions, let the exception propagate

Overall, the patch is looking really good.  After the next round of fixes, I'll do some testing
and further review.

We also definitely want to get good test coverage of this.  Unit tests that extend CQLTester
would be a good start (and they could use functions from the test class for UDFs).


was (Author: thobbs):
> Application level conflicts may arise if new 'core CQL' functions appear in a future
C* version with the same name of existing UDFs. We should make that clear in documentation
and encourage users to use namespaces. The only alternative that came to my mind was to force
users to use SELECT ::sin(foo)... for non-namespace UDFs - but I'm not happy with that alternative.

I think encouraging the use of namespaces is the best way to go.  Perhaps on startup we could
check for any non-namespaced UDFs that would be inaccessible due to a collision with a (new)
core function name and log a warning?  At that point, the user could define the function within
a namspace.

Comments on the patch:

* UFMetaData:

  * Bad javadoc for the class
  * Redundant work in constructors
  * compatibleArgs():

    * argType should not be null (no TypeAndData)
    * To fix the IRE case, make AssignementTestable.isAssignableTo() not throw an IRE,
      add validateAssignableTo() which does throw an IRE

  * parseCQLType(): want InvalidRequestException instead of ConfigurationException
  * dropFunction(): use ColumnFamily.delete() to remove the whole partition instead of individual
rows

    * Do we need a way to drop a specific signature and not all versions of a function?

  * adding toString() would be nice, as it's expected by the logger in a couple of places
  
* MigrationManager:

  * Throw IRE instead of ConfigurationException (and update DropFunctionStatement)

* CFMetaData:

  * Use 604800 for gc_grace_seconds (see CASSANDRA-7668)

* FunctionCall:

  * prepare(): no need for curly braces on "if" with single statement
  * prepare(): s/Unknown CQL3 function called/Unknown function called/
  * toString(): no single-line if-statements, please
  
* Selectable:

  * toString(): no single-line if-statements

* DefsTables:

  * add/update/dropFunction() do need to notify listeners (you have TODOs there), so we need
MigrationManager.notifyAdd/Update/DropUDF()

* Event:

  * Keyspace string should be empty for UDF changes (we'll need a v4 protocol doc soon)
  * serializeEvent() and eventSerializedSize() will need to handle function events specially
when the protocolversion < 4.  I guess we shouldn't send events at all? I'm not too sure
about this one.
  
* CreateFunctionStatement:

  * constructor: Instead of throwing NPEs, use assertions
  * changeEvent(): whitespace is off 
  * execute(): throw IREs instead of ConfigurationExceptions

* DropFunctionStatement:

  * Unused imports

* UDFunction:

  * "throws" can go on the same line as method signature
  * Throw IRE instead of ConfigurationException
  
* UDFRegistry:
  * no need for curlies on single-statement "ifs" (there are a few)
  * refreshInitial():
  
    * use QueryProcess.executeOnceInternal() instead of process()
    * put for-loop initialization on single line
    * if we fail to query system.schema_functions, let the exception propagate

Overall, the patch is looking really good.  After the next round of fixes, I'll do some testing
and further review.

We also definitely want to get good test coverage of this.  Unit tests that extend CQLTester
would be a good start (and they could use functions from the test class for UDFs).

> Support for pure user-defined functions (UDF)
> ---------------------------------------------
>
>                 Key: CASSANDRA-7395
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7395
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Robert Stupp
>              Labels: cql
>             Fix For: 3.0
>
>         Attachments: 7395.txt, udf-create-syntax.png, udf-drop-syntax.png
>
>
> We have some tickets for various aspects of UDF (CASSANDRA-4914, CASSANDRA-5970, CASSANDRA-4998)
but they all suffer from various degrees of ocean-boiling.
> Let's start with something simple: allowing pure user-defined functions in the SELECT
clause of a CQL query.  That's it.
> By "pure" I mean, must depend only on the input parameters.  No side effects.  No exposure
to C* internals.  Column values in, result out.  http://en.wikipedia.org/wiki/Pure_function



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message