phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeffrey Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-914) Native HBase timestamp support to optimize date range queries in Phoenix
Date Wed, 19 Nov 2014 19:09:34 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14218325#comment-14218325
] 

Jeffrey Zhong commented on PHOENIX-914:
---------------------------------------

I did pursue a pseudo column before. Below are the issues I faced:

1) The main purpose of exposing HBase timestamp is to skip HFiles totally which are not in
time range a user specified. This gives huge scan performance especially for time series data
access pattern. While this requires to know the time range in query compile time. This is
different than the expression in where clause which normally becomes a server filter. By then
it's too late because the filter is working on key value level.

2) Pseudo column approach has to smartly, implicitly derive scan time range from where clause
expression as a super set time range bound. While in cases like:
"time1 < native_time_stamp and time2>native_time_stamp and time3 < native_time_stamp
and time4 > native_time_stamp", there are multiple time ranges and could easily throw the
optimization off. 

3) UDF can't be applied to the Pseudo column as we need to resolve this in compile time.
  
I haven't gone through all the way so there might be some new issues. That's the reason I
moved to query hint approach which is simple and can be easily replaced later. 

  


  



> Native HBase timestamp support to optimize date range queries in Phoenix 
> -------------------------------------------------------------------------
>
>                 Key: PHOENIX-914
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-914
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.0.0
>            Reporter: Vladimir Rodionov
>            Assignee: Jeffrey Zhong
>         Attachments: PHOENIX-914.patch
>
>
> For many applications one of the column of a table can be (and must be) naturally mapped

> to HBase timestamp. What it gives us is the optimization on StoreScanner where HFiles
with timestamps out of range of
> a Scan operator will be omitted. Let us say that we have time-series type of data (EVENTS)
and custom compaction, where we create 
> series of HFiles with continuous non-overlapping timestamp ranges.
> CREATE TABLE IF NOT EXISTS ODS.EVENTS (
>     METRICID  VARCHAR NOT NULL,
>     METRICNAME VARCHAR,
>     SERVICENAME VARCHAR NOT NULL,
>     ORIGIN VARCHAR NOT NULL,
>     APPID VARCHAR,
>     IPID VARCHAR,
>     NVALUE DOUBLE,
>     TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/,
>     DATA VARCHAR,
>     SVALUE VARCHAR
>     CONSTRAINT PK PRIMARY KEY (METRICID, SERVICENAME, ORIGIN, APPID, IPID, TIME)
> ) SALT_BUCKETS=40, IMMUTABLE_ROWS=true,VERSIONS=1,DATA_BLOCK_ENCODING='NONE';
> Make note on   TIME TIMESTAMP NOT NULL  /+ TIMESTAMP +/ - this is the Hint to Phoenix
that the column
> TIME must be mapped to HBase timestamp. 
> The Query:
> Select all events of type 'X' for last 7 days
> SELECT * from EVENTS WHERE METRICID = 'X' and TIME < NOW() and TIME > NOW() - 7*24*3600000;
(this may be not correct SQL syntax of course)
> These types of queries will be efficiently optimized if:
> 1. Phoenix maps  TIME column to HBase timestamp
> 2. Phoenix smart enough to map WHERE clause on TIME attribute to Scan timerange 
> Although this :
> Properties props = new Properties();
> props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));
> Connection conn = DriverManager.connect(myUrl, props);
> conn.createStatement().execute("UPSERT INTO myTable VALUES ('a')");
> conn.commit();
> will work in my case- it may not be efficient from performance point of view because
for every INSERT/UPSERT 
> new Connection object and new Statement is created, beside this we still need the optimization
2. (see above). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message