hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Gray (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1141) Fetching large numbers of columns is slow outside of HDFS
Date Tue, 20 Jan 2009 00:19:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665288#action_12665288

Jonathan Gray commented on HBASE-1141:

We're looking into this because our most common query is:  get_all_columns(table, row, family)

When this is a smaller number of columns, our random access times are on the order of 2ms.
 But if there are a few thousand columns in the family, this can take >100ms.

Certainly there are some inefficiencies in a query like this because you must check all stores,
but even when serving out of memory (the new cache Erik is designing) there is a significant
performance hit to having many columns.

Erik has done some timing and can post what he has found.

> Fetching large numbers of columns is slow outside of HDFS
> ---------------------------------------------------------
>                 Key: HBASE-1141
>                 URL: https://issues.apache.org/jira/browse/HBASE-1141
>             Project: Hadoop HBase
>          Issue Type: Bug
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.20.0
> While working on a Cell cache, we have found during random-read tests that the number
of columns has an enormous impact on performance.  Accounting for increased HDFS access time,
there is still a great deal of time being spent coming out of the Region and then across the
wire to HTable.
> Erik Holstad has done this testing and will post some of his results here when completed.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message