Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 30791 invoked from network); 29 Mar 2010 16:26:09 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 29 Mar 2010 16:26:09 -0000 Received: (qmail 43859 invoked by uid 500); 29 Mar 2010 16:26:08 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 43738 invoked by uid 500); 29 Mar 2010 16:26:08 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 43730 invoked by uid 99); 29 Mar 2010 16:26:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Mar 2010 16:26:07 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of clint.a.m@gmail.com designates 209.85.210.178 as permitted sender) Received: from [209.85.210.178] (HELO mail-yx0-f178.google.com) (209.85.210.178) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Mar 2010 16:26:00 +0000 Received: by yxe8 with SMTP id 8so5871145yxe.30 for ; Mon, 29 Mar 2010 09:25:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:received:message-id:subject :from:to:content-type; bh=88V4+xQeGZQTqDcp5+vJMw4b+P/GAHAzMoZmPYQNhCA=; b=uDhmTt6LpaetrJkuI6NpCrct7lma2CXHn8Vh4BrMDCXHBEOWH86PtInV1a14aDRz0/ ad9P1NsZqHN++Xn9nE3swMob+6b5GdddVXAHga15i0ZC+RMTbhg5vVf0mPyTti09yqnI 9W6KOIox3EADr5JUrhgH1lEDXuqW3zJIypRO4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=HczLq+N2qqRYJFV0LIU7jal51CYSXxOiAybczGdVO2l/LHXrUbIPTlgXXDPHG1Rtke LKn02fj3JEnYSpd0CcQSYgScuHMzGhdTIO0AfUVE13HE4BRaV1sbaZ8UQIh5qtN2K2/M W+xwX43xav5H/WJek3lhKPYcWiyooVR4ByuVM= MIME-Version: 1.0 Sender: clint.a.m@gmail.com Received: by 10.100.131.14 with HTTP; Mon, 29 Mar 2010 09:25:39 -0700 (PDT) In-Reply-To: <96c64fa81003290752t2b8f0010qeac0c561975b13f2@mail.gmail.com> References: <96c64fa81003290752t2b8f0010qeac0c561975b13f2@mail.gmail.com> Date: Mon, 29 Mar 2010 09:25:39 -0700 X-Google-Sender-Auth: eb99af37393bbe5b Received: by 10.101.39.2 with SMTP id r2mr757893anj.67.1269879939721; Mon, 29 Mar 2010 09:25:39 -0700 (PDT) Message-ID: Subject: Re: IndexedTable puts removing index rows for updated timestamped values? From: Clint Morgan To: hbase-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Def not the expected behavior and does not sound like user error.. Quick skim looks likk its https://issues.apache.org/jira/browse/HBASE-2286. Hbase does not gracefully handle the case where a put after a delete both have the same millisecond timestamp. Indexing table contrib was using this pattern to maintain indexes. Above jira works around it. NOTE: Current patch has a bug in it where if you delete only an "additionalColumn" in the base table, then it does not get deleted in the index. I'll put a fix for that up shortly. On Mon, Mar 29, 2010 at 7:52 AM, George Stathis wrote: > Hey folks, > > I hope this is just user error but I wanted to see if folks have encountered > this scenario using IndexedTable. We followed the well known by now article > on how to set up secondary indexes ( > http://rajeev1982.blogspot.com/2009/06/secondary-indexes-in-hbase.html) . > Works OK on the first test inserts but we are noticing an unexpected > behavior that I'll try to illustrate with the following example: > > - Assume a table 'foo' with a column family 'bar', a generic > qualifier 'bar:myColumn' and an indexed qualifier 'bar:myIndex' > - We thus have two actual tables, 'foo' and 'foo-myIndex' > - Assume a single put on table 'foo' that produces one row in 'foo' and one > index row in 'foo-myIndex'. > - Assume a second put on table 'foo' for the same row as above that updates > the qualifier 'bar:myColumn' but leaves 'bar:myIndex' as is. Both values get > updated timestamps. > - We are noticing that around 50% of the time this scenario is executed, the > index row in 'foo-myIndex' disappears even though 'bar:myIndex' value was > not changed. Again, this behavior is not reproduced reliably, it takes > several attempts to see it. > - We are noticing that if we submit the second put without the 'foo-myIndex' > cell, the 'foo-myIndex' will be left alone. > > We are seeing this happening even if we extract 'bar:myIndex' to a new > column family 'bar2:myIndex' that only allows one version per cell. So > basically, if the timestamp changes, there is a risk of losing the index > entry, regardless of whether the cell value was changed. Is this expected > behavior? > > -GS >