Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 94469 invoked from network); 5 Mar 2010 02:48:01 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 Mar 2010 02:48:01 -0000 Received: (qmail 79152 invoked by uid 500); 5 Mar 2010 02:47:48 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 79018 invoked by uid 500); 5 Mar 2010 02:47:48 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 79010 invoked by uid 99); 5 Mar 2010 02:47:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Mar 2010 02:47:48 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Mar 2010 02:47:47 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 37362234C4C9 for ; Fri, 5 Mar 2010 02:47:27 +0000 (UTC) Message-ID: <1384638725.87961267757247225.JavaMail.jira@brutus.apache.org> Date: Fri, 5 Mar 2010 02:47:27 +0000 (UTC) From: "Todd Lipcon (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-2248) Provide new non-copy mechanism to assure atomic reads in get and scan In-Reply-To: <174857300.448551266881787938.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841669#action_12841669 ] Todd Lipcon commented on HBASE-2248: ------------------------------------ Hey Ryan I looked over this patch a bit this afternoon. It's clever but I think it can result in loss of read-your-own-writes consistency for a single client. Consider this scenario: || Action || Read # || Write # || memstoreRead || memstoreWrite || | Client A begins a put on row R | - | 1 | 0 | 1 | | Client B begins a put on row S | - | 2| 0 | 2 | | Client B finishes a put on row S | - | - | 0 | 2 | | Client B initiates a get on row S | 0 | - | 0 | 2 | So, since client A's put #1 is still ongoing on a separate row, client B is unable to read version #2 of its row. I think dropping consistency below read-your-own-writes is bad, even though it's rare that the above situation would occur. Under high throughput I think it's possible to occur, and it could be very very bad if people are relying on this level of consistency to implement transactions, etc. One possible solution is that completeMemstoreInsert can spin until memstoreRead >= e.getWriteNumber(). Given that it only has to wait for other concurrent writers to finish, a spin on memstoreRead.get() should only go a few cycles and actually be reasonably efficient. I'll think a bit about whether there are other possible solutions. > Provide new non-copy mechanism to assure atomic reads in get and scan > --------------------------------------------------------------------- > > Key: HBASE-2248 > URL: https://issues.apache.org/jira/browse/HBASE-2248 > Project: Hadoop HBase > Issue Type: Bug > Affects Versions: 0.20.3 > Reporter: Dave Latham > Fix For: 0.20.4 > > Attachments: HBASE-2248-demonstrate-previous-impl-bugs.patch, HBASE-2248-ryan.patch, hbase-2248.gc, HBASE-2248.patch, Screen shot 2010-02-23 at 10.33.38 AM.png, threads.txt > > > HBASE-2037 introduced a new MemStoreScanner which triggers a ConcurrentSkipListMap.buildFromSorted clone of the memstore and snapshot when starting a scan. > After upgrading to 0.20.3, we noticed a big slowdown in our use of short scans. Some of our data repesent a time series. The data is stored in time series order, MR jobs often insert/update new data at the end of the series, and queries usually have to pick up some or all of the series. These are often scans of 0-100 rows at a time. To load one page, we'll observe about 20 such scans being triggered concurrently, and they take 2 seconds to complete. Doing a thread dump of a region server shows many threads in ConcurrentSkipListMap.biuldFromSorted which traverses the entire map of key values to copy it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.