hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Guanghao Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-19818) Scan time limit not work if the filter always filter row key
Date Thu, 18 Jan 2018 02:24:00 GMT

     [ https://issues.apache.org/jira/browse/HBASE-19818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Guanghao Zhang updated HBASE-19818:
-----------------------------------
    Description: 
[https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java]

nextInternal() method.
{code:java}
// Check if rowkey filter wants to exclude this row. If so, loop to next.
 // Technically, if we hit limits before on this row, we don't need this call.
 if (filterRowKey(current)) {
 incrementCountOfRowsFilteredMetric(scannerContext);
 // early check, see HBASE-16296
 if (isFilterDoneInternal()) {
 return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
 }
 // Typically the count of rows scanned is incremented inside #populateResult. However,
 // here we are filtering a row based purely on its row key, preventing us from calling
 // #populateResult. Thus, perform the necessary increment here to rows scanned metric
 incrementCountOfRowsScannedMetric(scannerContext);
 boolean moreRows = nextRow(scannerContext, current);
 if (!moreRows) {
 return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
 }
 results.clear();
 continue;
 }

// Ok, we are good, let's try to get some results from the main heap.
 populateResult(results, this.storeHeap, scannerContext, current);
 if (scannerContext.checkAnyLimitReached(LimitScope.BETWEEN_CELLS)) {
 if (hasFilterRow) {
 throw new IncompatibleFilterException(
 "Filter whose hasFilterRow() returns true is incompatible with scans that must "
 + " stop mid-row because of a limit. ScannerContext:" + scannerContext);
 }
 return true;
 }

{code}
If filterRowKey always return ture, then it skip to checkAnyLimitReached. For batch/size limit,
it is ok to skip as we don't read anything. But for time limit, it is not right. If the filter
always filter row key, we will stuck here for a long time.

> Scan time limit not work if the filter always filter row key
> ------------------------------------------------------------
>
>                 Key: HBASE-19818
>                 URL: https://issues.apache.org/jira/browse/HBASE-19818
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>            Priority: Major
>
> [https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java]
> nextInternal() method.
> {code:java}
> // Check if rowkey filter wants to exclude this row. If so, loop to next.
>  // Technically, if we hit limits before on this row, we don't need this call.
>  if (filterRowKey(current)) {
>  incrementCountOfRowsFilteredMetric(scannerContext);
>  // early check, see HBASE-16296
>  if (isFilterDoneInternal()) {
>  return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
>  }
>  // Typically the count of rows scanned is incremented inside #populateResult. However,
>  // here we are filtering a row based purely on its row key, preventing us from calling
>  // #populateResult. Thus, perform the necessary increment here to rows scanned metric
>  incrementCountOfRowsScannedMetric(scannerContext);
>  boolean moreRows = nextRow(scannerContext, current);
>  if (!moreRows) {
>  return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
>  }
>  results.clear();
>  continue;
>  }
> // Ok, we are good, let's try to get some results from the main heap.
>  populateResult(results, this.storeHeap, scannerContext, current);
>  if (scannerContext.checkAnyLimitReached(LimitScope.BETWEEN_CELLS)) {
>  if (hasFilterRow) {
>  throw new IncompatibleFilterException(
>  "Filter whose hasFilterRow() returns true is incompatible with scans that must "
>  + " stop mid-row because of a limit. ScannerContext:" + scannerContext);
>  }
>  return true;
>  }
> {code}
> If filterRowKey always return ture, then it skip to checkAnyLimitReached. For batch/size
limit, it is ok to skip as we don't read anything. But for time limit, it is not right. If
the filter always filter row key, we will stuck here for a long time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message