Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 9479 invoked from network); 31 Mar 2009 23:26:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Mar 2009 23:26:14 -0000 Received: (qmail 24974 invoked by uid 500); 31 Mar 2009 23:26:14 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 24936 invoked by uid 500); 31 Mar 2009 23:26:14 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 24926 invoked by uid 99); 31 Mar 2009 23:26:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2009 23:26:14 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 31 Mar 2009 23:26:11 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 7E581234C003 for ; Tue, 31 Mar 2009 16:25:50 -0700 (PDT) Message-ID: <17449085.1238541950503.JavaMail.jira@brutus> Date: Tue, 31 Mar 2009 16:25:50 -0700 (PDT) From: "Erik Holstad (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-74) [performance] When a get or scan request spans multiple columns, execute the reads in parallel MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694320#action_12694320 ] Erik Holstad commented on HBASE-74: ----------------------------------- When working on HBASE-1249 this thought came to my mind so I tried to design the new system so that it would be pretty easy to add this. There are still a few things that need to be done to make this work properly and I haven't ran any test to see how much we would gain. For making things parallel there are a couple of places where that can be done. If start query is a [Get] then we can make things in parallel in multiple places: HRegionServer: Every get can be done in parallel. HRegion : Every family in the get can be done in parallel HStore : Every read from memCache + storefiles can be done in parallel. Starting at the bottom to support parallel computation of data. HStore : So you have a nunch of lists that you need to compare, they are: 1. Data list in sf, storefile 2. The get list, with families and columns to look for 3. The result 4. The deletes from previous sf 5. The deletes from this read. Data, get, result, oldDeletes, newDeletes. With current layout where puts and deletes are mixed you can : 1. Compare the data in the different sf with the get and create a list of candidates and a list of new deletes for that sf. The compare includes checks for TimeRange, TTL and number of versions. 2. Merge deletes one by one starting at memCache and moving down the sfs. For every merge you send that new delete list into the serverGet it belongs to and move on to the merge with next new delete list. 3. When all delete checks are done you are left with your candidate lists from all the sfs, they now needs to be merged and checked for number of versions. So you have: 1. GetCandidates and new deletes 2. Merge deletes and check sgets towards the merged deletes 3. Merge candidates For parallel you have a list of sget with the same data, sgets // This call can be threaded 1. sget.createCandidates(List data, boolean multiFamily) 2. for(int i=0; i) Doing this can probably increase speed for a lot of cases, but I think that it will have the biggest impact on the GetFamilies query, before getFull, since you for that query need to look in all the storefiles anyways, which might not be the case for other queries. I think that it would be too hard to thread the gets from different families, specially now that we don't need to sort the result on the client side but can just append it to the list. Threading multiple gets shouldn't be too hard either. > [performance] When a get or scan request spans multiple columns, execute the reads in parallel > ---------------------------------------------------------------------------------------------- > > Key: HBASE-74 > URL: https://issues.apache.org/jira/browse/HBASE-74 > Project: Hadoop HBase > Issue Type: Improvement > Components: regionserver > Reporter: Jim Kellerman > Priority: Critical > Fix For: 0.20.0 > > > When a get or scan request spans multiple columns, execute the reads in parallel and use a CountDownLatch to wait for them to complete before returning the results. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.