Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7C667200C4D for ; Wed, 1 Mar 2017 04:51:51 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7B350160B7C; Wed, 1 Mar 2017 03:51:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C0EED160B7E for ; Wed, 1 Mar 2017 04:51:50 +0100 (CET) Received: (qmail 96399 invoked by uid 500); 1 Mar 2017 03:51:49 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 96299 invoked by uid 99); 1 Mar 2017 03:51:49 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Mar 2017 03:51:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 5055818E122 for ; Wed, 1 Mar 2017 03:51:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.547 X-Spam-Level: X-Spam-Status: No, score=-1.547 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id CVb9jnBWhofg for ; Wed, 1 Mar 2017 03:51:48 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id F08565FC3C for ; Wed, 1 Mar 2017 03:51:47 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id CB5D4E0901 for ; Wed, 1 Mar 2017 03:51:46 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8758D2416E for ; Wed, 1 Mar 2017 03:51:45 +0000 (UTC) Date: Wed, 1 Mar 2017 03:51:45 +0000 (UTC) From: "Guanghao Zhang (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 01 Mar 2017 03:51:51 -0000 [ https://issues.apache.org/jira/browse/HBASE-17125?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D158= 89451#comment-15889451 ]=20 Guanghao Zhang commented on HBASE-17125: ---------------------------------------- I thought this should be fixed by the second idea. 1. check by the max versions of this column, HCD's maxVersions 2. check the kv by filter 3. check the versions which user need, scan.maxVersions [~huaxiang] [~saint.ack@gmail.com] [~apurtell] [~lhofhansl] [~ram_krish] [~= Apache9] What do you think about this idea=EF=BC=9F > Inconsistent result when use filter to read data > ------------------------------------------------ > > Key: HBASE-17125 > URL: https://issues.apache.org/jira/browse/HBASE-17125 > Project: HBase > Issue Type: Bug > Reporter: Guanghao Zhang > Priority: Critical > Fix For: 2.0.0 > > Attachments: example.diff > > > Assume a cloumn's max versions is 3, then we write 4 versions of this col= umn. The oldest version doesn't remove immediately. But from the user view,= the oldest version has gone. When user use a filter to query, if the filte= r skip a new version, then the oldest version will be seen again. But after= compact the region, then the oldest version will never been seen. So it is= weird for user. The query will get inconsistent result before and after re= gion compaction. > The reason is matchColumn method of UserScanQueryMatcher. It first check = the cell by filter, then check the number of versions needed. So if the fil= ter skip the new version, then the oldest version will be seen again when i= t is not removed. > Have a discussion offline with [~Apache9] and [~fenghh], now we have two = solution for this problem. The first idea is check the number of versions f= irst, then check the cell by filter. As the comment of setFilter, the filte= r is called after all tests for ttl, column match, deletes and max versions= have been run. > {code} > /** > * Apply the specified server-side filter when performing the Query. > * Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests > * for ttl, column match, deletes and max versions have been run. > * @param filter filter to run on the server > * @return this for invocation chaining > */ > public Query setFilter(Filter filter) { > this.filter =3D filter; > return this; > } > {code} > But this idea has another problem, if a column's max version is 5 and the= user query only need 3 versions. It first check the version's number, then= check the cell by filter. So the cells number of the result may less than = 3. But there are 2 versions which don't read anymore. > So the second idea has three steps. > 1. check by the max versions of this column > 2. check the kv by filter > 3. check the versions which user need. > But this will lead the ScanQueryMatcher more complicated. And this will b= reak the javadoc of Query.setFilter. > Now we don't have a final solution for this problem. Suggestions are welc= omed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)