phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Akshita Malhotra (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (PHOENIX-3744) Support snapshot scanners for MR-based queries
Date Mon, 24 Apr 2017 20:42:04 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15981853#comment-15981853
] 

Akshita Malhotra edited comment on PHOENIX-3744 at 4/24/17 8:41 PM:
--------------------------------------------------------------------

Parallel Scan grouper is extended to differentiate the functionality for getting region boundaries

Added integration test, compares the snapshot read result with the result from select query
by setting CurrentScn value.

the configuration parameter is the snapshot name key, if set do a snapshot read

Used an existing PhoenixIndexDBWritable class for the purpose of testing, will add a new one
as I will add more tests.

ExpressionProjector functionality is extended for snapshots as the keyvalue format returned
from TableSnapshotScanner is different from ClientScanner and therefore not properly interrupted
by Phoenix thereby returning null in case of projected columns.
For the same table, following shows the different format of the keyvalues:

ClientScanner:
keyvalues={AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=�SSDD��}

TableSnapshotScanner:
keyvalues={AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x,
AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD}

To DO:
Add more integration tests to cover different scenarios such as where clause etc

[~jamestaylor]


was (Author: akshita.malhotra):
Parallel Scan grouper is extended to differentiate the functionality for getting region boundaries

Added integration test, compares the snapshot read result with the result from select query
by setting CurrentScn value.

the configuration parameter is the snapshot name key, if set do a snapshot read

Used an existing PhoenixIndexDBWritable class for the purpose of testing, will add a new one
as I will add more tests.

ExpressionProjector functionality is extended for snapshots as the keyvalue format returned
from TableSnapshotScanner is different from ClientScanner and therefore not properly interrupted
by Phoenix thereby returning null in case of projected columns.
For the same table, following shows the different format of the keyvalues:

ClientScanner:
keyvalues={AAPL/_v:\x00\x00\x00\x01/1493061452132/Put/vlen=7/seqid=0/value=�SSDD��}

TableSnapshotScanner:
keyvalues={AAPL/0:\x00\x00\x00\x00/1493061673859/Put/vlen=1/seqid=4/value=x,
AAPL/0:\x80\x0B/1493061673859/Put/vlen=4/seqid=4/value=SSDD}

To DO:
Add more integration tests to cover different scenarios such as where clause etc

> Support snapshot scanners for MR-based queries
> ----------------------------------------------
>
>                 Key: PHOENIX-3744
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3744
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Akshita Malhotra
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses the region
directly in HDFS. We should make sure that Phoenix can support that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes that will
be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any data committed
after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message