Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 75917 invoked from network); 17 Jan 2008 00:20:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Jan 2008 00:20:56 -0000 Received: (qmail 18193 invoked by uid 500); 17 Jan 2008 00:20:45 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 18152 invoked by uid 500); 17 Jan 2008 00:20:45 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 18142 invoked by uid 99); 17 Jan 2008 00:20:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 Jan 2008 16:20:45 -0800 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Jan 2008 00:20:28 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8B52571420E for ; Wed, 16 Jan 2008 16:20:34 -0800 (PST) Message-ID: <28656619.1200529234568.JavaMail.jira@brutus> Date: Wed, 16 Jan 2008 16:20:34 -0800 (PST) From: "stack (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-2513) [hbase] HStore#get and HStore#getFull may not return expected values by timestamp when there is more than one MapFile In-Reply-To: <2920092.1199322515294.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12559757#action_12559757 ] stack commented on HADOOP-2513: ------------------------------- Patch looks good but its going to kill performance so I want to run basic PE test before committing just to see how much its going to cost us. Chatting w/ Bryan, scanners likely have same issue. This patch doesn't address that. > [hbase] HStore#get and HStore#getFull may not return expected values by timestamp when there is more than one MapFile > --------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-2513 > URL: https://issues.apache.org/jira/browse/HADOOP-2513 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Reporter: Bryan Duxbury > Assignee: stack > Fix For: 0.16.0 > > Attachments: 2512-v2.patch, 2513.patch > > > Ok, this one is a little tricky. Let's say that you write a row with some value without a timestamp, thus meaning right now. Then, the memcache gets flushed out to a MapFile. Then, you write another value to the same row, this time with a timestamp that is in the past, ie, before the "now" timestamp of the first put. > Some time later, but before there is a compaction, if you do a get for this row, and only ask for a single version, you will logically be expecting the latest version of the cell, which you would assume would be the one written at "now" time. Instead, you will get the value written into the "past" cell, because even though it is tagged as having happened in the past, it actually *was written* after the "now" cell, and thus when #get searches for satisfying values, it runs into the one most recently written first. > The result of this problem is inconsistent data results. Note that this problem only ever exists when there's an uncompacted HStore, because during compaction, these cells will all get sorted into the correct order by timestamp and such. In a way, this actually makes the problem worse, because then you could easily get inconsistent results from HBase about the same (unchanged) row depending on whether there's been a flush/compaction. > The only solution I can think of for this problem at the moment is to scan all the MapFiles and Memcache for possible results, sort them, and then select the desired number of versions off of the top. This is unfortunate because it means you never get the snazzy shortcircuit logic except within a single mapfile or memcache. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.