phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3209) Ensure scans run at specific server timestamp for UPSERT SELECT to same table
Date Wed, 17 May 2017 22:46:04 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014887#comment-16014887
] 

James Taylor commented on PHOENIX-3209:
---------------------------------------

When a client uses the UPDATE_CACHE_FREQUENCY feature, they're basically saying that they
*don't* want to ping the server for a timestamp at which to run the query every time. So in
this case, we don't restrict the upper time range of the scan. The rule in SQL is that a statement
should not see the changes that it's making. The only time this is an issue is when the same
table is being read and written to. HBase already prevents this through it's MVCC model, but
this would break down if a split occurs as we'll end up issuing a new scan under a different
MVCC lock. This is definitely a corner case. It'd require a split to occur and for some of
the data that was previously written to have been written to the new daughter region that
hadn't been read yet.

The point you're making is a different issue - should SQL see future timestamped data? Currently,
it'd be kind of weird as if you're using UPDATE_CACHE_FREQUENCY, you'd see the future timestamped
data unless you get a cache miss (after expiration), in which case the query would be run
with an upper time bound. I'm not sure what the best answer is for this. Maybe we should always
hit the server for an UPSERT SELECT or a DELETE that issues a scan? How about filing a new
JIRA for this one so we can brainstorm?

> Ensure scans run at specific server timestamp for UPSERT SELECT to same table
> -----------------------------------------------------------------------------
>
>                 Key: PHOENIX-3209
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3209
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Maddineni Sukumar
>             Fix For: 4.11.0
>
>
> This is a corner case of specifying an UPDATE_CACHE_FREQUENCY on a table and executing
an UPSERT SELECT. Without an UPDATE_CACHE_FREQUENCY, we ping the server to ensure we have
the latest version of the schema. We'll then run the query based on the server timestamp returned
as a result of checking that the schema is up-to-date. If an UPDATE_CACHE_FREQUENCY is set,
we skip this RPC which is a potential problem in this case. This becomes more likely when
we introduce a default UPATE_CACHE_FREQUENCY with PHOENIX-2885. The fix is to ignore the UPDATE_CACHE_FREQUENCY
when an UPSERT SELECT is performed where the source and target table are the same.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message