hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "shimon golan" <sgo...@gmail.com>
Subject Re: querying by column value rather than key value
Date Thu, 29 May 2008 14:56:22 GMT
Thanx Bryan for the quick response !
My table is very sparse and contains 100 column families , each containing
about 20 items. Also, I need to search the table by each of the columns so
the solution you suggested seems somewhat complicated for this purpose.

So my ensuing questions are  :
1. Do you plan to support having indexes on columns and thus avoiding the
necessity of scanning when a column value is queried ?
2. I thought about using map reduce for the purpose of parallelizing the
search over different regions of the table - could this be accomplished  ?

On Thu, May 29, 2008 at 5:42 PM, Bryan Duxbury <bryan@rapleaf.com> wrote:

> In a regular hbase table, there is no "most efficient" way to do this. The
> only operation you have available is a table scan.
>
> If you find yourself looking for rows by their column values, then you must
> do some extra work to make that possible. First, make absolutely sure that
> you have the right row key picked out. If your accesses are dominated by
> searching on a column value, then perhaps that column should be your primary
> key.
>
> If you must have both the existing primary key and the column value -based
> lookups, then probably your best bet is to make an "index" table. The way
> that works is that every time you write or delete some value to the primary
> table, you also write or delete a value to the index table with the column
> value as the key and the row key as a column value. Then, when you are
> trying to find the row by its column value, you look in the index table to
> find the row key, and then you query the main table with the row key. It's
> more work, but this is the best that HBase can offer at the moment.
>
> -Bryan
>
>
>
> On May 29, 2008, at 5:47 AM, Shimon wrote:
>
>  Hi,
>> What is most efficient way to retrieve a row when the value of a certain
>> column is specified ( rather than the row key ) ?
>>
>> Thanks in advance
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message