phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julian (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-967) Consider (some of) SQL 2011 temporal support
Date Tue, 14 May 2019 11:30:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839329#comment-16839329
] 

Julian commented on PHOENIX-967:
--------------------------------

I have a business use case for which I wish to perform a query to find only the latest event
status rows over a given timeframe, but for which the traditional time series approach taken
in Phoenix-914 is not a good fit. I find the solution for Phoenix-914 is not at a generic
approach to using hbase timestamps, but rather focuses on a given problem. I'm looking rather
looking for a way to use the general hbase features via sql.

The Use Case is as follows -

1) We receive events on basis of an incident number, which is the primary row key.
2) We keep updating the row as we receive more events for that incident with the latest status
information. Thus it is important we maintain a single row with the current status which also
contains the incident created time, last updated time etc, whether the it is still open or
closed etc. Versions may also be interesting in future to see the history of the incident,
but right now we are only interested in most recent version.
3) Some of the End User queries need to find any incidents that have been updated in the last
7 days, thus are time based, however they also need to be able to query a specific incident
for example by it's incident number without triggering long scans. Note, two years of incidents are
in the table, the majority of which will not have been updated since being closed, so querying
on last 7 days is a limited data set of the full data.

In the current phoenix implementation, there seems no way for me to query the latest version
of rows over a time range in Phoenix without using PHOENIX-914 feature, but I can't due to
the primary key requirements (we don't want time in the key). Rather I am looking simply for
a way to specify the TIMERANGE, in a similar way as I can when performing an HBASE timerange
scan directly on Hbase. and get only the latest version in my case.

In addition to the above, I am also not seemingly able to set the row / column timestamps
to the event timestamp in the way I want to also, as seems this is only possibly by making
a connection and setting a connection level property but that would need to be for every event
(not handy) or use the PHOENIX-914 features but that means putting the timestamp into the
primary key which I do not want. So nothing is fitting here well.

It's a shame, as the Use Case is not that complex, hbase wise but also potentially phoenix
wise. Ultimately I am unable to use Phoenix for this use case and we have to look at other
solutions.

Note, secondary indexing we are testing, however this is not delivering the performance results
we need whilst the time range scan via hbase api is working perfectly. This however is not
End User friendly and we were hoping to enable SQL via Hive on Phoenix on Hbase for this use
case for some end user queries. 

 

 

 

> Consider (some of) SQL 2011 temporal support
> --------------------------------------------
>
>                 Key: PHOENIX-967
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-967
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>            Priority: Major
>
> Language extensions for temporal data were added in SQL 2011 (*1), providing among other
things the ability to define period definitions as metadata to table, system-versioned tables
(CREATE TABLE ... WITH SYSTEM VERSIONING), the ability to query by system version time, and
syntax for conditions involving periods: CONTAINS, OVERLAPS, EQUALS, PRECEDES, SUCCEEDS, IMMEDIATELY
PRECEDES, and IMMEDIATELY SUCCEEDS.
> While not proposing support for all of SQL 2011 temporal features, it seems that system
versioning, the new temporal comparison operators, and the new SELECT syntax for versioned
queries map pretty naturally to multiversoned HBase table schema and timerange filtering.
And at least one commercial database supports temporal queries using an older nonstandard
syntax (*2). 
> E.g. from *1:\\
> - {{SELECT ... FOR SYSTEM_TIME AS OF TIMESTAMP <timestamp>...}}
> - {{SELECT ... FOR SYSTEM_TIME BETWEEN TIMESTAMP <timestamp-start> AND TIMESTAMP
<timestamp-end> ...}}
> I didn't pay to download the 2011 standard doc so can't say if TIMESTAMP is required
or if we can stand in any type that can be serialized to a long integer. 
> And from *2: \\
> - {{SELECT ... AS OF TIMESTAMP <timestamp> ...}}
> - {{SELECT ... VERSIONS BETWEEN TIMESTAMP <timestamp-start> AND <timestamp-end>
...}}
> *1: http://www.sigmod.org/publications/sigmod-record/0612/publications/1209/pdfs/07.industry.kulkarni.pdf
> *2: http://www.oracle.com/technetwork/database/storage/total-recall-whitepaper-171749.pdf




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message