phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [phoenix] dbwong commented on a change in pull request #482: PHOENIX-4925 Use Segment tree to organize Guide Post Info
Date Tue, 23 Apr 2019 23:17:29 GMT
dbwong commented on a change in pull request #482: PHOENIX-4925 Use Segment tree to organize
Guide Post Info
URL: https://github.com/apache/phoenix/pull/482#discussion_r277883057
 
 

 ##########
 File path: phoenix-core/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java
 ##########
 @@ -885,292 +867,215 @@ private static boolean clipKeyRangeBytes(RowKeySchema schema, int
fieldIndex, in
         return maxOffset != offset;
     }
 
+    private List<KeyRange> getRowKeyRanges(List<HRegionLocation> regionLocations,
boolean isLocalIndex) {
+        List<KeyRange> queryKeyRanges = null;
+
+        // Use the dataplan to build the queryRowKeyRanges
+        if (isLocalIndex) {
+            // TODO: when implementing PHOENIX-4585, we should change this to an assert
+            // as we should always have a data plan when a local index is being used.
+            if (dataPlan != null && dataPlan.getTableRef().getTable().getType() !=
PTableType.INDEX) { // Sanity check
+                int columnsInCommon = computeColumnsInCommon();
+                ScanRanges prefixScanRanges = computePrefixScanRanges(dataPlan.getContext().getScanRanges(),
columnsInCommon);
+                List<KeyRange> queryKeyRangesTemp = prefixScanRanges.getRowKeyRanges();
+
+                if (queryKeyRangesTemp.size() == 1 && queryKeyRangesTemp.get(0) ==
KeyRange.EVERYTHING_RANGE) {
+                    queryKeyRanges = queryKeyRangesTemp;
+                } else {
+                    List<KeyRange> newQueryRowKeyRanges = Lists.newArrayListWithExpectedSize(
+                            queryKeyRangesTemp.size() * regionLocations.size());
+
+                    for (HRegionLocation regionLocation : regionLocations) {
+                        HRegionInfo regionInfo = regionLocation.getRegionInfo();
+
+                        // Only attempt further pruning if the prefix range is using
+                        // a skip scan since we've already pruned the range of regions
+                        // based on the start/stop key.
+                        if (columnsInCommon > 0 && prefixScanRanges.useSkipScanFilter())
{
+                            byte[] regionStartKey = regionInfo.getStartKey();
+                            ImmutableBytesWritable ptr = context.getTempPtr();
+                            clipKeyRangeBytes(prefixScanRanges.getSchema(), 0,
+                                    columnsInCommon, regionStartKey, ptr, false);
+                            regionStartKey = ByteUtil.copyKeyBytesIfNecessary(ptr);
+                            // Prune this region if there's no intersection
+                            if (!prefixScanRanges.intersectRegion(regionStartKey, regionInfo.getEndKey(),
false)) {
+                                continue;
+                            }
+                        }
+
+                        for (KeyRange queryKeyRange : queryKeyRangesTemp) {
+                            KeyRange newQueryRowKeyRange = queryKeyRange.prependRange(
+                                    regionInfo.getStartKey(),0,regionInfo.getStartKey().length);
+                            newQueryRowKeyRanges.add(newQueryRowKeyRange);
+                        }
+                    }
+
+                    queryKeyRanges = newQueryRowKeyRanges;
+                }
+            }
+        }
+
+        if (queryKeyRanges == null) {
+            ScanRanges scanRanges = context.getScanRanges();
+            queryKeyRanges = scanRanges.getRowKeyRanges();
+        }
+
+        return queryKeyRanges;
+    }
+
     /**
      * Compute the list of parallel scans to run for a given query. The inner scans
      * may be concatenated together directly, while the other ones may need to be
-     * merge sorted, depending on the query.
-     * Also computes an estimated bytes scanned, rows scanned, and last update time
-     * of statistics. To compute correctly, we need to handle a couple of edge cases:
-     * 1) if a guidepost is equal to the start key of the scan.
-     * 2) If a guidepost is equal to the end region key.
-     * In both cases, we set a flag (delayAddingEst) which indicates that the previous
-     * gp should be use in our stats calculation. The normal case is that a gp is
-     * encountered which is in the scan range in which case it is simply added to
-     * our calculation.
-     * For the last update time, we use the min timestamp of the gp that are in
-     * range of the scans that will be issued. If we find no gp in the range, we use
-     * the gp in the first or last region of the scan. If we encounter a region with
-     * no gp, then we return a null value as an indication that we don't know with
-     * certainty when the stats were updated last. This handles the case of a split
-     * occurring for a large ingest with stats never having been calculated for the
-     * new region.
+     * merge sorted, depending on the query. Also computes an estimated bytes scanned,
+     * rows scanned, and last update time of statistics.
+     *
      * @return list of parallel scans to run for a given query.
      * @throws SQLException
      */
     private List<List<Scan>> getParallelScans(byte[] startKey, byte[] stopKey)
throws SQLException {
+        ScanRanges scanRanges = context.getScanRanges();
         List<HRegionLocation> regionLocations = getRegionBoundaries(scanGrouper);
         List<byte[]> regionBoundaries = toBoundaries(regionLocations);
-        ScanRanges scanRanges = context.getScanRanges();
+
         PTable table = getTable();
         boolean isSalted = table.getBucketNum() != null;
         boolean isLocalIndex = table.getIndexType() == IndexType.LOCAL;
-        GuidePostsInfo gps = getGuidePosts();
-        // case when stats wasn't collected
-        hasGuidePosts = gps != GuidePostsInfo.NO_GUIDEPOST;
-        // Case when stats collection did run but there possibly wasn't enough data. In such
a
-        // case we generate an empty guide post with the byte estimate being set as guide
post
-        // width.
-        boolean emptyGuidePost = gps.isEmptyGuidePost();
-        byte[] startRegionBoundaryKey = startKey;
-        byte[] stopRegionBoundaryKey = stopKey;
-        int columnsInCommon = 0;
-        ScanRanges prefixScanRanges = ScanRanges.EVERYTHING;
-        boolean traverseAllRegions = isSalted || isLocalIndex;
-        if (isLocalIndex) {
-            // TODO: when implementing PHOENIX-4585, we should change this to an assert
-            // as we should always have a data plan when a local index is being used.
-            if (dataPlan != null && dataPlan.getTableRef().getTable().getType() !=
PTableType.INDEX) { // Sanity check
-                prefixScanRanges = computePrefixScanRanges(dataPlan.getContext().getScanRanges(),
columnsInCommon=computeColumnsInCommon());
-                KeyRange prefixRange = prefixScanRanges.getScanRange();
-                if (!prefixRange.lowerUnbound()) {
-                    startRegionBoundaryKey = prefixRange.getLowerRange();
-                }
-                if (!prefixRange.upperUnbound()) {
-                    stopRegionBoundaryKey = prefixRange.getUpperRange();
-                }
+        // We'll never have a case where a table is both salted and local.
+        assert !(isSalted && isLocalIndex);
+
+        Long pageLimit = getUnfilteredPageLimit(scan);
 
 Review comment:
   I had a thought for a followup JIRA.  If pageLimit approaches the size of our stored guideposts
we can probably improve our estimate from a combination of data.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message