phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3744) Support snapshot scanners for MR-based queries
Date Mon, 08 May 2017 21:13:04 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16001550#comment-16001550
] 

ASF GitHub Bot commented on PHOENIX-3744:
-----------------------------------------

Github user JamesRTaylor commented on the issue:

    https://github.com/apache/phoenix/pull/239
  
    I don't think it's necessary to fully understand the functionality to do the refactoring
I've mentioned, @akshita-malhotra. Here's how I'd recommend approaching it:
    
    * create a new interface solely for the purpose of only abstracting RegionCoprocessorEnvironment
access called RegionContext. The interface would have at least two methods: getRegion and
getConfiguration. We might need more if other methods are called in RegionCoprocessorEnvironment.
    * have two implementations of this interface: RegionCoprocessorContext and RegionShapshotContext.
The constructor of RegionCoprocessorContext would take a RegionCoprocessorEnvironment as an
argument, while the RegionShapshotContext would take a Region and Configuration.
    * do an across the board replace of RegionCoprocessorEnvironment with RegionContext. You
can likely not do this for secondary index related code (org.apache.phoenix.hbase.index.Indexer
and PhoenixTransactionalIndexer). You'll find out here if other methods are called from RegionCoprocessorEnvironment
or ObserverContext (which can be dealt with in a variety of ways, for example by throwing
an UnsupportedOperationException if need be in the snapshot implementation).
    * in the top level coprocessor methods that pass in RegionCoprocessorEnvironment (mostly
abstract BaseScannerRegionObserver class), instantiate a RegionCoprocessorContext by passing
in the RegionCoprocessorEnvironment. From this point onward, all access will go through the
RegionContext interface.
    
    You could do this refactoring completely separate from the PHOENIX-3744 so that you don't
mix the two. Then PHOENIX-3744 would have something like a RegionScannerFactory (your RegionObserverUtil)
that gives you back a RegionScanner given a RegionContext and you'd create a RegionShapshotContext
as the backing implementation in your snapshot reading code.
    
    
    
    



> Support snapshot scanners for MR-based queries
> ----------------------------------------------
>
>                 Key: PHOENIX-3744
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3744
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Akshita Malhotra
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses the region
directly in HDFS. We should make sure that Phoenix can support that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes that will
be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any data committed
after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message