hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke Forehand <luke.foreh...@networkedinsights.com>
Subject Re: Secondary Index versus Full Table Scan
Date Tue, 03 Aug 2010 16:36:50 GMT
Edward Capriolo <edlinuxguru@...> writes:

> Generally speaking: If you are doing full range scans of a table
> indexes will not help. Adding indexes will make the performance worse,
> it will take longer to load your data and now fetching the data will
> involve two lookups instead of one.
> 
> If you are doing full range scans adding more nodes should result in
> linear scale up.
> 
> 

Edward,

Can you clarify what "full range scan" means?  I am not doing "full" range
scans, but I am doing relatively large range scans (3 million records), so I
think what you are saying applies.  Thanks for the insight.  

We initially implemented the secondary index out of a need to have our main data
sorted by multiple dimensions for various use cases.  Now I'm thinking it may be
better to have multiple copies of our main data, sorted in multiple ways, to
avoid the two lookups.  So I'm faced with two options right now; multiple copies
of the data sorted in multiple ways to do range scans, or buy a lot more servers
and do full scans.  Given these two choices, do people have general
recommendations on which makes the most sense?

Thanks!
-Luke


Mime
View raw message