cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10786) Include hash of result set metadata in prepared statement id
Date Thu, 19 May 2016 13:49:12 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291106#comment-15291106
] 

Robert Stupp edited comment on CASSANDRA-10786 at 5/19/16 1:48 PM:
-------------------------------------------------------------------

Oh, right. We invalidate a pstmt when one of its dependencies changes - so, I thought too
complicated.

Another possible way to solve the opt-in/long-hash problem would be to just add another identifier,
which is the hash over the result set metadata. So, the current ID would stay as it is and
we add a _fingerprint_ to _Prepared_ response and _Execute_ request.

For native_protocol_v5.spec, section _4.2.5.4. Prepared_ would contain:
{code}
    - <id> is [short bytes] representing the prepared query ID.
    - <fingerprint> is [short bytes] representing the metadata hash.
    - <metadata> is composed of:
{code}
And the body for _4.1.6 Execute_ would be {{<id><fingerprint><query_parameters>}}.

To handle the situation when that result-set-metadata-fingerprint does not match, there are
two options IMO.
# The coordinator could reply with a new error code (near to 0x2500, Unprepared) telling the
client that the result set metadata no longer matches and the statement needs to be prepared
again.
# We just send out the result set metadata with the _Rows_ response in case the metadata has
changed / does not match the fingerprint.

The second option would also work around a race condition that could arise with a new error
code during schema changes. Means: some nodes may already use the new result set metadata
while others still use the old one. It would also save one roundtrip. It makes the code on
the client probably a bit more complex, but I think it's worth to pay that price in order
to prevent this race condition (and _prepare storm_).


was (Author: snazy):
Oh, right. We invalidate a pstmt when one of its dependencies changes - so, I thought too
complicated.

Another possible way to solve the opt-in/long-hash problem would be to just add another identifier,
which is the hash over the result set metadata. So, the current ID would stay as it is and
we add a _fingerprint_ to _Prepared_ response and _Execute_ request.

For native_protocol_v5.spec, section _4.2.5.4. Prepared_ would contain:
{code}
    - <id> is [short bytes] representing the prepared query ID.
    - <fingerprint> is [short bytes] representing the metadata hash.
    - <metadata> is composed of:
{code}
And the body for _4.1.6 Execute_ would be {{<id><fingerprint><query_parameters>}}.

To handle the situation when that result-set-metadata-fingerprint does not match, there are
two options IMO.
# The coordinator could reply with a new error code (near to 0x2500, Unprepared) telling the
client that the result set metadata no longer matches and the statement needs to be prepared
again.
# We just send out the result set metadata with the _Rows_ response in case it has.

The second option would also work around a race condition that could arise with a new error
code during schema changes. Means: some nodes may already use the new result set metadata
while others still use the old one. It would also save one roundtrip. It makes the code on
the client probably a bit more complex, but I think it's worth to pay that price in order
to prevent this race condition (and _prepare storm_).

> Include hash of result set metadata in prepared statement id
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-10786
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10786
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>            Reporter: Olivier Michallat
>            Assignee: Alex Petrov
>            Priority: Minor
>              Labels: client-impacting, protocolv5
>             Fix For: 3.x
>
>
> This is a follow-up to CASSANDRA-7910, which was about invalidating a prepared statement
when the table is altered, to force clients to update their local copy of the metadata.
> There's still an issue if multiple clients are connected to the same host. The first
client to execute the query after the cache was invalidated will receive an UNPREPARED response,
re-prepare, and update its local metadata. But other clients might miss it entirely (the MD5
hasn't changed), and they will keep using their old metadata. For example:
> # {{SELECT * ...}} statement is prepared in Cassandra with md5 abc123, clientA and clientB
both have a cache of the metadata (columns b and c) locally
> # column a gets added to the table, C* invalidates its cache entry
> # clientA sends an EXECUTE request for md5 abc123, gets UNPREPARED response, re-prepares
on the fly and updates its local metadata to (a, b, c)
> # prepared statement is now in C*’s cache again, with the same md5 abc123
> # clientB sends an EXECUTE request for id abc123. Because the cache has been populated
again, the query succeeds. But clientB still has not updated its metadata, it’s still (b,c)
> One solution that was suggested is to include a hash of the result set metadata in the
md5. This way the md5 would change at step 3, and any client using the old md5 would get an
UNPREPARED, regardless of whether another client already reprepared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message