Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-dev@hadoop.apache.org
Message-ID: <1963664045.1232410742104.JavaMail.jira@brutus>
Date: Mon, 19 Jan 2009 16:19:02 -0800 (PST)
From: "Jonathan Gray (JIRA)" <jira@apache.org>
To: hbase-dev@hadoop.apache.org
Subject: [jira] Commented: (HBASE-1141) Fetching large numbers of columns is
 slow outside of HDFS
In-Reply-To: <1708429866.1232410379585.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HBASE-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665288#action_12665288 ] 

Jonathan Gray commented on HBASE-1141:
--------------------------------------

We're looking into this because our most common query is:  get_all_columns(table, row, family)

When this is a smaller number of columns, our random access times are on the order of 2ms.  But if there are a few thousand columns in the family, this can take >100ms.

Certainly there are some inefficiencies in a query like this because you must check all stores, but even when serving out of memory (the new cache Erik is designing) there is a significant performance hit to having many columns.

Erik has done some timing and can post what he has found.

> Fetching large numbers of columns is slow outside of HDFS
> ---------------------------------------------------------
>
>                 Key: HBASE-1141
>                 URL: https://issues.apache.org/jira/browse/HBASE-1141
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
>
>
> While working on a Cell cache, we have found during random-read tests that the number of columns has an enormous impact on performance.  Accounting for increased HDFS access time, there is still a great deal of time being spent coming out of the Region and then across the wire to HTable.
> Erik Holstad has done this testing and will post some of his results here when completed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.