hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-4823) long running scans lose benefit of bloomfilters and timerange hints
Date Wed, 23 Nov 2011 18:59:41 GMT

     [ https://issues.apache.org/jira/browse/HBASE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Phabricator updated HBASE-4823:
-------------------------------

    Attachment: HBASE-4823.D519.1.patch

aaiyer requested code review of "HBASE-4823 [jira] long running scans lose benefit of bloomfilters
and timerange hints".
Reviewers: JIRA

  Changes to the StoreScanner so that whenever we do a resetScannerStack
  we use the same getScanner() method as done in the constructor to ignore
  files that are not going to be touched by the scan.

  Includes a test to ensure correctness.

  When you have a long running scan due to say an MR job, you can lose the benefit of timerange
hints & bloom filters midway if your scanner gets reset. <span class="error">[Note:
The scanners can get reset say due to a flush or compaction]</span>.

  In one of our workloads, we periodically want to do rollups on recent 15 minutes of data
in a column family... but the timerange hint benefit is lost midway when this resetScannerStack
(shown below) happens. And end result-- we end up reading all the old HFiles rather than just
the recent HFiles.   <div class="code panel" style="border-width: 1px;"><div class="codeContent
panelContent"> <pre class="code-java"><span class="code-keyword">private</span>
void resetScannerStack(KeyValue lastTopKey) <span class="code-keyword">throws</span>
IOException {     <span class="code-keyword">if</span> (heap != <span class="code-keyword">null</span>)
{       <span class="code-keyword">throw</span> <span class="code-keyword">new</span>
RuntimeException(<span class="code-quote">"StoreScanner.reseek run on an existing heap!"</span>);
    }      /* When we have the scan object, should we not pass it to getScanners()      *
to get a limited set of scanners? We did so in the constructor and we 
      * could have done it now by storing the scan object from the constructor */     List<KeyValueScanner>
scanners = getScanners();</pre> </div></div>

  The comment in the code seems to be aware of this issue and even has the suggested fix!

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D519

AFFECTED FILES
  src/test/java/org/apache/hadoop/hbase/regionserver/TestScannerResets.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/1149/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> long running scans lose benefit of bloomfilters and timerange hints
> -------------------------------------------------------------------
>
>                 Key: HBASE-4823
>                 URL: https://issues.apache.org/jira/browse/HBASE-4823
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Kannan Muthukkaruppan
>            Assignee: Amitanand Aiyer
>         Attachments: HBASE-4823.D519.1.patch, TestScannerResets-89fb.txt
>
>
> When you have a long running scan due to say an MR job, you can lose the benefit of timerange
hints & bloom filters midway if your scanner gets reset. [Note: The scanners can get reset
say due to a flush or compaction].
> In one of our workloads, we periodically want to do rollups on recent 15 minutes of data
in a column family... but the timerange hint benefit is lost midway when this resetScannerStack
(shown below) happens. And end result-- we end up reading all the old HFiles rather than just
the recent HFiles.
> {code}
>  private void resetScannerStack(KeyValue lastTopKey) throws IOException {
>     if (heap != null) {
>       throw new RuntimeException("StoreScanner.reseek run on an existing heap!");
>     }
>     /* When we have the scan object, should we not pass it to getScanners()
>      * to get a limited set of scanners? We did so in the constructor and we
>      * could have done it now by storing the scan object from the constructor */
>     List<KeyValueScanner> scanners = getScanners();
> {code}
> The comment in the code seems to be aware of this issue and even has the suggested fix!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message