phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cameron Hatfield (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-6) Support ON DUPLICATE KEY construct
Date Tue, 04 Oct 2016 20:44:20 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546605#comment-15546605
] 

Cameron Hatfield commented on PHOENIX-6:
----------------------------------------

Ahh, I saw where I was getting confused. I thought that Phoenix implemented executebatch according
to the JDBC contract (multiple statements will be sent in one go to the server), however instead
they send each statement separately to the underlying db.

Would probably be worthwhile to mention that if you want to use this for counters / etc, that
each separate statement hitting the same row requires a separate commit (just to make it more
obvious that it is how it would work). This would also be an interesting interaction with
how UPSERT ... SELECT batches inserts, when duplicates are involved, since the it would act
differently depending on where duplicates where in the batch. Technically, I believe this
is a problem now, as UPSERT .. SELECT would allow you to read data that is not yet "committed",
so its possible that the documentation for that should be updated (as a separate bug): "Non
transactional tables have no guarantees above and beyond the HBase guarantee of row level
atomicity (see here). *In addition, non transactional tables will not see their updates until
after a commit has occurred.*"



> Support ON DUPLICATE KEY construct
> ----------------------------------
>
>                 Key: PHOENIX-6
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: James Taylor
>            Assignee: James Taylor
>             Fix For: 4.9.0
>
>
> To support inserting a new row only if it doesn't already exist, we should support the
"on duplicate key" construct for UPSERT. With this construct, the UPSERT VALUES statement
would run atomically and would thus require a read before write which would obviously have
a negative impact on performance. For an example of similar syntax , see MySQL documentation
at http://dev.mysql.com/doc/refman/5.7/en/insert-on-duplicate.html
> See this discussion for more detail: https://groups.google.com/d/msg/phoenix-hbase-user/Bof-TLrbTGg/68bnc8ZcWe0J.
A related discussion is on PHOENIX-2909.
> Initially we'd support the following:
> # This would prevent the setting of VAL to 0 if the row already exists:
> {code}
> UPSERT INTO T (PK, VAL) VALUES ('a',0) 
> ON DUPLICATE KEY IGNORE;
> {code}
> # This would increment the valueS of COUNTER1 and COUNTER2 if the row already exists
and otherwise initialize them to 0:
> {code}
> UPSERT INTO T (PK, COUNTER1, COUNTER2) VALUES ('a',0,0) 
> ON DUPLICATE KEY COUNTER1 = COUNTER1 + 1, COUNTER2 = COUNTER2 + 1;
> {code}
> So the general form is:
> {code}
> UPSERT ... VALUES ... [ ON DUPLICATE KEY [IGNORE | UPDATE <column>=<expression>,
...] ]
> {code}
> The following restrictions will apply:
> - The <column> may not be part of the primary key constraint - only KeyValue columns
will be allowed.
> - If the table is immutable, the <column> may not appear in a secondary index.
This is because the mutations for indexes on immutable tables are calculated on the client-side,
while this new syntax would potentially modify the value on the server-side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message