Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D67DD11B47 for ; Tue, 26 Aug 2014 01:13:34 +0000 (UTC) Received: (qmail 37173 invoked by uid 500); 26 Aug 2014 01:13:34 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 37092 invoked by uid 500); 26 Aug 2014 01:13:34 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 37070 invoked by uid 99); 26 Aug 2014 01:13:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Aug 2014 01:13:33 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tobeg3oogle@gmail.com designates 209.85.215.52 as permitted sender) Received: from [209.85.215.52] (HELO mail-la0-f52.google.com) (209.85.215.52) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Aug 2014 01:13:28 +0000 Received: by mail-la0-f52.google.com with SMTP id b17so14262556lan.11 for ; Mon, 25 Aug 2014 18:13:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=o713bAUZfy3GhGDVZpJ5HAr/7EmPXOFznpkB42frKt4=; b=X8Z4NjxIP+yaUd8HR/UN85l9wuPcmBYC9Od35IQ4nCxJs89DCqmjlZNZ2fC1bJb+PQ wa/IeWBKzISXrPV/Be8Jp3fF10BiVN+loL2DOOSF73Plb03R2j53xoqEY0hf5yh/OwZg 07DRp3pUDTOMdyNfWhcqFNKh529K2FNh8S0xOqNmMufBQEkzQN3p89SNHzUxhuLMzKLy DywsCfKhpqPXir0Qebcb8h3Dqlgefj/5rUP4jLrQn7XDPQ4u3/K0ekuOhVOwGg9cLAjb esAYJaznAOvGss2OcZqY7ktvlQINT+GrH0owJJ48/DhFJNFbd2RBu6Vz+nElLkT6z419 70og== MIME-Version: 1.0 X-Received: by 10.152.203.232 with SMTP id kt8mr24127067lac.27.1409015586684; Mon, 25 Aug 2014 18:13:06 -0700 (PDT) Received: by 10.25.10.7 with HTTP; Mon, 25 Aug 2014 18:13:06 -0700 (PDT) In-Reply-To: <1408990857.2341.YahooMailNeo@web140605.mail.bf1.yahoo.com> References: <1408990857.2341.YahooMailNeo@web140605.mail.bf1.yahoo.com> Date: Tue, 26 Aug 2014 09:13:06 +0800 Message-ID: Subject: Re: Should scan check the limitation of the number of versions? From: tobe To: hbase-dev , lars hofhansl Cc: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a1134789687030605017e0177 X-Virus-Checked: Checked by ClamAV on apache.org --001a1134789687030605017e0177 Content-Type: text/plain; charset=UTF-8 @lars I have set {KEEP_DELETED_CELLS => 'false'} in that table. I will get the same result before manually running `flush`. You can try the commands I gave and it's 100% repro. On Tue, Aug 26, 2014 at 2:20 AM, lars hofhansl wrote: > Queries of past time ranges only work correctly when KEEP_DELETED_CELLS is > enabled for the column families. > > > ________________________________ > From: tobe > To: hbase-dev > Cc: "user@hbase.apache.org" > Sent: Monday, August 25, 2014 4:32 AM > Subject: Re: Should scan check the limitation of the number of versions? > > > I haven't read the code deeply but I have an idea(not sure whether it's > right or not). When we scan the the columns, we will skip the one which > doesn't match(deleted). Can we use a counter to record this? For each skip, > we add one until it reaches the restrictive number of versions. But we have > to consider mvcc and others, which seems more complex. > > > > > > On Mon, Aug 25, 2014 at 5:54 PM, tobe wrote: > > > So far, I have found two problems about this. > > > > Firstly, HBase-11675 >. > > It's a little tricky and rarely happens. But it asks users to be careful > of > > compaction which occurs on server side. They may get different results > > before and after the major compaction. > > > > Secondly, if you put a value with timestamp 100, then put another value > on > > the same column with timestamp 200. Here we set the number of version as > 1. > > So when we get the value of this column, we will get the latest one with > > timestamp 200 and that's right. But if I get with a timerange form 0 to > > 150, I may get the first value with timestamp 100 before compaction > > happens. And after compaction happens, you will never get this value even > > you run the same command. > > > > It's easy to repro, follow this steps: > > hbase(main):001:0> create "table", "cf" > > hbase(main):003:0> put "table", "row1", "cf:a", "value1", 100 > > hbase(main):003:0> put "table", "row1", "cf:a", "value1", 200 > > hbase(main):026:0> get "table", "row1", {TIMERANGE => [0, 150]} // > before > > flush > > row1 column=cf:a, timestamp=100, value=value1 > > hbase(main):060:0> flush "table" > > hbase(main):082:0> get "table", "row1", {TIMERANGE => [0, 150]} // after > > flush > > 0 row(s) in 0.0050 seconds > > > > I think the reason of that is we have three restriction to remove data: > > delete, ttl and versions. Any time we get or scan the data, we will check > > the delete mark and ttl to make sure it will not return to users. But for > > versions, we don't check this limitation. Our output relies on the > > compaction to cleanup the overdue data. Is it possible to add this > > condition within scan(get is implemented as scan)? > > > --001a1134789687030605017e0177--