hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TuX RaceR <tuxrace...@gmail.com>
Subject Re: IHBase indexes persistence
Date Sun, 21 Mar 2010 10:14:06 GMT
Thanks Dan for the detailed explanation ;) yes, I was suspecting that as 
the data gets bigger, having to query all regions would cause 
scalability issues.

Dan Washusen wrote:
>>
>> The feature provided by IHbase is very appealing. It seems to correspond to
>> a use case very common in applications (at least in mine ;) )
>>     
>
> The functionality of IHBase might not be as useful as you think.  Take
> the following very basic user table layout:
>
> username (key) | email | name | password
>
> That table layout works great when you want to find a user by
> username, for example, when the user logs in.  You can simply do a get
> on the table with the username.  Now you need to add functionality to
> enable a user to retrieve their forgotten password.  The seemingly
> obvious solution with IHBase would be to add secondary index to the
> email column.  You could then perform a scan on the table with the
> appropriate index hint to fetch the user by their email address.  That
> solution would work while your dataset is small (one or two regions)
> but as your dataset grows and spans many hundreds of regions it's no
> longer a viable option.  The reason it's not a viable option is that
> IHbase maintains an index on the email column per region.  In order to
> find the row that has the email address you are looking for the scan
> must contact every region.  The scan would still return reasonable
> quickly (say each region responded in a few milliseconds) but it's
> still far to resource intensive...
>
> The way to make scans fast in HBase is to provide a start row and stop
> row and the same rule applies to IHBase.  It's just that with IHBase
> the scan will return much faster if the start and stop rows span a
> large range...
>   


Mime
View raw message