phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Petschulat (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-4552) Add support for a ROW_TIMESTAMP that doesn't affect the primary key
Date Tue, 23 Jan 2018 19:20:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-4552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336270#comment-16336270
] 

Stephen Petschulat commented on PHOENIX-4552:
---------------------------------------------

> There might be a middle ground, though, but I suspect there will be many edge cases.
If we still continue to return only the single latest version and restrict the filtering done
on ROW_TIMESTAMP to use only greater than or greater than or equal then it might be more feasible. 

That is actually the only case that we need. We would like to specify a timestamp and get
back only most the recent row less than or equal to that timestamp. It will allow us to upload
a batch CSV file and set the timestamp for all rows to the current timestamp. Subsequent uploads
will set a newer timestamp but any queries running against the old timestamp data (assuming
they specify it correctly) will see the old version of the CSV while the new one uploads.

Setting VERSIONS to some reasonable number will allow old ones to get garbage collected automatically.
We just have to make sure we don't have queries that run overly long against old versions.

 

> Add support for a ROW_TIMESTAMP that doesn't affect the primary key
> -------------------------------------------------------------------
>
>                 Key: PHOENIX-4552
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4552
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Stephen Petschulat
>            Priority: Minor
>
> By declaring a ROW_TIMESTAMP constraint on a Phoenix table, it does two things 1) expose
the hbase native timestamp as this column and 2) prepend your primary key with this timestamp
as well.
> It would be useful to have a similar feature that only exposes the hbase native timestamp.
This would allow explicit setting of the timestamp when upserting data while allowing multiple
hbase versions. It is possible to then query for that specific key and version(s).
> Potential approach:
> {code:sql}
> CREATE TABLE COMMENTS (
>    COMMENT_ID INT NOT NULL,
>    REVISION_NUM BIGINT NOT NULL ROW_TIMESTAMP,    // NEW use of keyword
>    COMMENT_BODY TEXT
>    CONSTRAINT PK PRIMARY KEY(COMMENT_ID))
> UPSERT INTO COMMENTS (123, 1, 'edit 1 comment')
> UPSERT INTO COMMENTS (123, 2, 'edit 2 of comment')
> UPSERT INTO COMMENTS (123, 3, 'edit 3 of comment')
> {code}
>  
> Current behavior of ROW_TIMESTAMP would create a new primary for each upsert, so querying
by primary key is no longer straightforward when you don't know the version number at query
time. 
> {code:sql}
> SELECT * FROM COMMENTS WHERE COMMENT_ID = 123  // => returns most recent version
'edit 3 of comment'
> SELECT * FROM COMMENTS WHERE COMMENT_ID = 123 AND REVISION_NUM = 1   // => returns
explicit version 'edit 1 comment'
> {code}
>  
> It can also be useful to return multiple versions (related: PHOENIX-590)
> {code:sql}
> SELECT * FROM COMMENTS WHERE COMMENT_ID = 123 AND REVISION_NUM < 3   // => returns 2
rows
> {code}
>  
> Or just the highest version less than or equal to a particular version (allowing snapshot
queries):
> {code:sql}
> // set CurrentSCN=2 on connection
> SELECT * FROM COMMENTS WHERE COMMENT_ID = 123 // => returns 'edit 2 of comment'
> {code}
> CurrentSCN already allows this type of snapshot query but not against an explicitly set
timestamp with multiple versions. The primary key injection prevents this. The above query would
behave similar to:
> {code:java}
> scan 'COMMENTS', {TIMERANGE => [0, <maxversionid+1>]}
> {code}
>  This returns the highest versioned value for each key that is less than a specified
maximum version number.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message