phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3744) Support snapshot scanners for MR-based queries
Date Wed, 24 May 2017 17:20:04 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16023267#comment-16023267
] 

ASF GitHub Bot commented on PHOENIX-3744:
-----------------------------------------

Github user JamesRTaylor commented on a diff in the pull request:

    https://github.com/apache/phoenix/pull/239#discussion_r118312938
  
    --- Diff: phoenix-core/src/main/java/org/apache/phoenix/iterate/DefaultParallelScanGrouper.java
---
    @@ -17,46 +17,79 @@
      */
     package org.apache.phoenix.iterate;
     
    +import java.sql.SQLException;
     import java.util.List;
     
    +import com.google.common.base.Preconditions;
    +import org.apache.hadoop.hbase.HRegionLocation;
     import org.apache.hadoop.hbase.client.Scan;
     import org.apache.phoenix.compile.QueryPlan;
    +import org.apache.phoenix.compile.StatementContext;
     import org.apache.phoenix.schema.PTable;
     import org.apache.phoenix.schema.PTable.IndexType;
     import org.apache.phoenix.schema.SaltingUtil;
    +import org.apache.phoenix.schema.TableRef;
     import org.apache.phoenix.util.ScanUtil;
     
     /**
      * Default implementation that creates a scan group if a plan is row key ordered (which
requires a merge sort),
    - * or if a scan crosses a region boundary and the table is salted or a local index. 
 
    + * or if a scan crosses a region boundary and the table is salted or a local index.
      */
     public class DefaultParallelScanGrouper implements ParallelScanGrouper {
    -	
    -	private static final DefaultParallelScanGrouper INSTANCE = new DefaultParallelScanGrouper();
     
    -    public static DefaultParallelScanGrouper getInstance() {
    -        return INSTANCE;
    -    }
    -    
    -    private DefaultParallelScanGrouper() {}
    -
    -	@Override
    -	public boolean shouldStartNewScan(QueryPlan plan, List<Scan> scans, byte[] startKey,
boolean crossedRegionBoundary) {
    -		PTable table = plan.getTableRef().getTable();
    -		boolean startNewScanGroup = false;
    -        if (!plan.isRowKeyOrdered()) {
    -            startNewScanGroup = true;
    -        } else if (crossedRegionBoundary) {
    -            if (table.getIndexType() == IndexType.LOCAL) {
    -                startNewScanGroup = true;
    -            } else if (table.getBucketNum() != null) {
    -                startNewScanGroup = scans.isEmpty() ||
    -                        ScanUtil.crossesPrefixBoundary(startKey,
    -                                ScanUtil.getPrefix(scans.get(scans.size()-1).getStartRow(),
SaltingUtil.NUM_SALTING_BYTES), 
    -                                SaltingUtil.NUM_SALTING_BYTES);
    -            }
    -        }
    -        return startNewScanGroup;
    +  private static DefaultParallelScanGrouper INSTANCE = new DefaultParallelScanGrouper();
    --- End diff --
    
    I don't think that DefaultParallelScanGrouper can be a singleton with the state of context
and tableName inside of it.


> Support snapshot scanners for MR-based queries
> ----------------------------------------------
>
>                 Key: PHOENIX-3744
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3744
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Akshita Malhotra
>         Attachments: PHOENIX-3744.patch
>
>
> HBase support scanning over snapshots, with a SnapshotScanner that accesses the region
directly in HDFS. We should make sure that Phoenix can support that.
> Not sure how we'd want to decide when to run a query over a snapshot. Some ideas:
> - if there's an SCN set (i.e. the query is running at a point in time in the past)
> - if the memstore is empty
> - if the query is being run at a timestamp earlier than any memstore data
> - as a config option on the table
> - as a query hint
> - based on some kind of optimizer rule (i.e. based on estimated # of bytes that will
be scanned)
> Phoenix typically runs a query at the timestamp at which it was compiled. Any data committed
after this time should not be seen while a query is running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message