cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xin jin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-13904) Performance improvement of Cassandra UDF/UDA
Date Tue, 26 Sep 2017 20:32:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16181515#comment-16181515
] 

xin jin commented on CASSANDRA-13904:
-------------------------------------

Simple experiments:

{code}
//Test function:
createTable("CREATE TABLE %s (a int primary key, b int)");
        List<String> queryList = new ArrayList<>();
        for (int i = 1, m = 10000; i < m; i++) {
            String queryString = "INSERT INTO %s (a, b) " + String.format("VALUES (%d, %d)",
i, i);
            execute(queryString);
        }
        String fState = createFunction(KEYSPACE,
                                       "int, int",
                                       "CREATE FUNCTION %s(a int, b int) " +
                                       "CALLED ON NULL INPUT " +
                                       "RETURNS int " +
                                       "LANGUAGE java " +
                                       "AS 'return Integer.valueOf((a!=null?a.intValue():0)
+ b.intValue());'");
        String a = createAggregate(KEYSPACE,
                                   "int, int",
                                   "CREATE AGGREGATE %s(int) " +
                                   "SFUNC " + shortFunctionName(fState) + " " +
                                   "STYPE int");
        // 1 + 2 + 3 = 6
        assertRows(execute("SELECT " + a + "(b) FROM %s"), row(49995000));
{code}

results:

1. enable_user_defined_functions_threads: false

TRACE: UDAggregate.java:198 - Executed UDA cql_test_keyspace.aggregate_2: 9999 call(s) to
state function cql_test_keyspace.function_1 in 37259μs, 17297μs, 26131μs

2. enable_user_defined_functions_threads: true

UDAggregate.java:198 - Executed UDA cql_test_keyspace.aggregate_2: 9999 call(s) to state function
cql_test_keyspace.function_1 in 555004μs, 457931μs, 475664μs


> Performance improvement of Cassandra UDF/UDA
> --------------------------------------------
>
>                 Key: CASSANDRA-13904
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13904
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: CQL
>            Reporter: xin jin
>            Priority: Critical
>              Labels: performance
>             Fix For: 3.11.x
>
>
> Hi All,
> We have made a few experiments and found that running query with direct UDF execution
is ten time more faster than the async UDF execution. The in-line comment: "Using async UDF
execution is expensive (adds about 100us overhead per invocation on a Core-i7 MBPr)” https://insight.io/github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/functions/UDFunction.java?line=293
show that this is a known behavior.  My questions are as below:
> 1. What are the main pros and cons of these two methods? Can I find any documents that
discuss this?  
> 2. Are there any plans to improve the performance of using async UDF? A simple way come
to my mind is to use some sort of batch method, e.g., replace current row by row method with
some rows by some rows. Are there any concerns on this?
> 3. How people solve this performance issue in general? It seems this performance issue
is not an urgent or an important issue to solve because it is known and it is still there.
Therefore people must have some sort of good solution solving this issue. 
> I really appreciate your comments in advance.
> Best regards,
> Xin



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message