cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
Date Thu, 19 May 2016 15:20:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291273#comment-15291273
] 

Robert Stupp commented on CASSANDRA-10786:
------------------------------------------

Well, leaving the {{id}} (which is the {{MD5Digest}} for the pstmt) as is allows backwards
compatibility.
The purpose of a _fingerprint_ is to provide a hash over {{ResultSet.ResultMetadata}} - something
like a _prepared statement version_.

Imagine that a (reasonable) amount of time can elapse until all cluster nodes have processed
the schema change. Nodes can be down for whatever reason and get the schema change late. Some
nodes can be unreachable for other nodes but still be available for clients. (Network partitions
occur when you don't need them.)
Additionally, a client probably talks to all nodes "simultaneously" and therefore gets different
results from nodes that have processed the schema change and those that did not have processed
it. Different results means: some nodes will say: "i don't know that pstmt ID - please re-prepare"
while others respond as expected.
We should not make such situations worse (by causing a _prepare storm_) than it already is
(schema disagreement).

For example, say you have an application that runs 100,000 queries per second for a prepared
statement.
At time=0, an {{ALTER TABLE foo ADD bar text}} is run. The schema migration takes for example
500ms (just a random number) until all nodes have "switched" their schema. This means that
50,000 queries may hit a node that has the new schema and re-prepare but hit another node
during the next request that does not have the new schema.

Also, the information a driver gets via the _control connection_ is not "just in time" - unlucky
driver instances may get the schema change notification via the control connections quite
late.

I'm not a fan of changing the way we compute the pstmt {{id}} as we're pleased between versions
(either C* releases or protocol versions) for the same reasons. I agree that we should probably
not specify the algorithm to compute such IDs into the native protocol specification - but
we should keep the algorithm to compute these IDs consistent.


> Include hash of result set metadata in prepared statement id
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-10786
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>            Reporter: Olivier Michallat
>            Assignee: Alex Petrov
>            Priority: Minor
>              Labels: client-impacting, protocolv5
>             Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a prepared statement
when the table is altered, to force clients to update their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. The first
client to execute the query after the cache was invalidated will receive an UNPREPARED response,
re-prepare, and update its local metadata. But other clients might miss it entirely (the MD5
hasn't changed), and they will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, clientA and clientB
both have a cache of the metadata (columns b and c) locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, re-prepares
on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been populated
again, the query succeeds. But clientB still has not updated its metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set metadata in the
md5. This way the md5 would change at step 3, and any client using the old md5 would get an
UNPREPARED, regardless of whether another client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message