Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9179D101DE for ; Fri, 5 Jul 2013 20:39:49 +0000 (UTC) Received: (qmail 455 invoked by uid 500); 5 Jul 2013 20:39:49 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 364 invoked by uid 500); 5 Jul 2013 20:39:49 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 355 invoked by uid 99); 5 Jul 2013 20:39:49 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Jul 2013 20:39:49 +0000 Date: Fri, 5 Jul 2013 20:39:49 +0000 (UTC) From: "Jesse Yates (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-8809) Include deletes in the scan (setRaw) method does not respect the time range or the filter MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701107#comment-13701107 ] Jesse Yates commented on HBASE-8809: ------------------------------------ As slight follow up to this, it feels like raw scans should also ignore the column version/timestamp filtering. In particular, I'm talking about this section in ScanQueryMatcher: {code} MatchCode colChecker = columns.checkColumn(bytes, offset, qualLength, timestamp, type, kv.getMemstoreTS() > maxReadPointToTrackVersions); /* * According to current implementation, colChecker can only be * SEEK_NEXT_COL, SEEK_NEXT_ROW, SKIP or INCLUDE. Therefore, always return * the MatchCode. If it is SEEK_NEXT_ROW, also set stickyNextRow. */ ... {code} Where the ScanWildcardColumnTracker will not ignore the timestamp in the simple case - four puts to the same row with different timestamps will ignore the oldest by default, even though its still "present" in the store regardless of the rawness of the scan. Thoughts? > Include deletes in the scan (setRaw) method does not respect the time range or the filter > ----------------------------------------------------------------------------------------- > > Key: HBASE-8809 > URL: https://issues.apache.org/jira/browse/HBASE-8809 > Project: HBase > Issue Type: Bug > Components: Scanners > Reporter: Vasu Mariyala > Assignee: Lars Hofhansl > Fix For: 0.98.0, 0.95.2, 0.94.10 > > Attachments: 8809-0.94.txt, 8809-trunk.txt, DeleteMarkers.doc > > > If a row has been deleted at time stamp 'T' and a scan with time range (0, T-1) is executed, it still returns the delete marker at time stamp 'T'. It is because of the code in ScanQueryMatcher.java > {code} > if (retainDeletesInOutput > || (!isUserScan && (EnvironmentEdgeManager.currentTimeMillis() - timestamp) <= timeToPurgeDeletes) > || kv.getMemstoreTS() > maxReadPointToTrackVersions) { > // always include or it is not time yet to check whether it is OK > // to purge deltes or not > return MatchCode.INCLUDE; > } > {code} > The assumption is scan (even with setRaw is set to true) should respect the filters and the time range specified. > Please let me know if you think this behavior can be changed so that I can provide a patch for it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira