cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9954) Improve Java-UDF timeout detection
Date Tue, 08 Dec 2015 17:10:11 GMT


Ariel Weisberg commented on CASSANDRA-9954:

Thanks Robert. Are we going to do this after CASSANDRA-10395?

I know this isn't part of this issue, but the whitelist and blacklist as constants seem a
little problematic. Just from a deployment and maintenance perspective allowing people to
manipulate them (mechanism not policy) as well as warning for some things rather then straight
up blocking them seems appropriate. If one thing we want to let people do is leverage existing
code inside UDFs then we don't want to be too inflexible. Definitely not something to do as
part of this, but I am broaching the subject.

Do we allow UDFs in writes? I read the blog post and it seems like you can mark the UDFs as
deterministic/non-deterministic. Part of paving the path for determinism is disallowing currentTimeMillis()
and nanoTime(). If they want time they should pass them to the UDF as a parameter when the
invoke the query. The same could be said for random number generation. For deterministic UDFs
you might be much more strict or have different warning/error policies for calling different
functions. Doing DNS resolution from a UDF isn't technically wrong if they have good caching
and timeouts in place (or we provide that for them).

For reads do UDFs only run at the coordinator or remotely at replicas before results are returned?
I suppose it doesn't really matter since the pain when versions or configurations have different
whitelist/blacklist settings is the same.

Checking metrics every 16 times is a little bit too often for most loop iterations. Maybe
make that a property? The check is not cheap and represents at least a hundred nanoseconds
of work possibly more. How often will people actually have loops to iterate through in UDFs?
I imagine if they tear apart a collection or a JSON doc it will be pretty heavyweight stuff.

[This isn't just verifying anymore, it's verifyAndInstrument.|]

I am not completely familiar with what the compiler does when emitting the labels for bytecode.
Does it have a convention to insert in a bunch of places? Inserting a check at all the labels
seems a bit excessive, but it's just performance so rather then guess as to how it works let's
just measure the performance in a meaningful way. Do we have a benchmark workload we could
run in cstar that would test UDF performance? Maybe one for a lightweight UDF and another
for the heaviest weight UDF we think we will come across? For the lightweight UDF we may want
to test an expression that invokes several UDFs per query so that it magnifies the transaction
cost of starting a UDF.

This is just my first pass reaction. I need to read up on the libraries you are using to do
byte code manipulation and how labels work.

> Improve Java-UDF timeout detection
> ----------------------------------
>                 Key: CASSANDRA-9954
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Robert Stupp
>            Assignee: Robert Stupp
>             Fix For: 3.x
> CASSANDRA-9402 introduced a sandbox using a thread-pool to enforce security constraints
and to detect "amok UDFs" - i.e. UDFs that essentially never return (e.g. {{while (true)}}.
> Currently the safest way to react on such an "amok UDF" is to _fail-fast_ - to stop the
C* daemon since stopping a thread (in Java) is just no solution.
> CASSANDRA-9890 introduced further protection by inspecting the byte-code. The same mechanism
can also be used to manipulate the Java-UDF byte-code.
> By manipulating the byte-code I mean to add regular "is-amok-UDF" checks in the compiled
> EDIT: These "is-amok-UDF" checks would also work for _UNFENCED_ Java-UDFs.

This message was sent by Atlassian JIRA

View raw message