Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@locus.apache.org Received: (qmail 71202 invoked from network); 20 Jan 2009 00:19:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Jan 2009 00:19:25 -0000 Received: (qmail 55301 invoked by uid 500); 20 Jan 2009 00:19:25 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 55278 invoked by uid 500); 20 Jan 2009 00:19:25 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 55267 invoked by uid 99); 20 Jan 2009 00:19:25 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Jan 2009 16:19:25 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Jan 2009 00:19:23 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 19BD5234C48D for ; Mon, 19 Jan 2009 16:19:02 -0800 (PST) Message-ID: <1963664045.1232410742104.JavaMail.jira@brutus> Date: Mon, 19 Jan 2009 16:19:02 -0800 (PST) From: "Jonathan Gray (JIRA)" To: hbase-dev@hadoop.apache.org Subject: [jira] Commented: (HBASE-1141) Fetching large numbers of columns is slow outside of HDFS In-Reply-To: <1708429866.1232410379585.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665288#action_12665288 ] Jonathan Gray commented on HBASE-1141: -------------------------------------- We're looking into this because our most common query is: get_all_columns(table, row, family) When this is a smaller number of columns, our random access times are on the order of 2ms. But if there are a few thousand columns in the family, this can take >100ms. Certainly there are some inefficiencies in a query like this because you must check all stores, but even when serving out of memory (the new cache Erik is designing) there is a significant performance hit to having many columns. Erik has done some timing and can post what he has found. > Fetching large numbers of columns is slow outside of HDFS > --------------------------------------------------------- > > Key: HBASE-1141 > URL: https://issues.apache.org/jira/browse/HBASE-1141 > Project: Hadoop HBase > Issue Type: Bug > Affects Versions: 0.20.0 > Reporter: Jonathan Gray > Fix For: 0.20.0 > > > While working on a Cell cache, we have found during random-read tests that the number of columns has an enormous impact on performance. Accounting for increased HDFS access time, there is still a great deal of time being spent coming out of the Region and then across the wire to HTable. > Erik Holstad has done this testing and will post some of his results here when completed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.