hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From TuX RaceR <tuxrace...@gmail.com>
Subject Re: random access and hotspots
Date Thu, 11 Mar 2010 21:40:54 GMT
Hi Alex

Thanks again for your detailed answer.

Alex Baranov wrote:

> So, 2 to 50 columns in each row. In case the single row size (in 
> bytes) is  not large then if requests load (number of concurrent 
> clients which perform described queries) is heavy, then you probably 
> should consider simple data duplication. I.e. rows with composite keys 
> (which you've put in "Indexes" table) will contain all data you need.

Yes, that is what I thought too after reading the typical read 
performance at:
Having to do 100 random access to generate just one web page would be 
too costly.
Even more that the list pages pointing to the documents do not need to 
show the whole document content (just the information necessary to 
generate a link and maybe a short summary)

> Given the fact the total count of all
> rows would be 1-10bil this might work well for you. Of course this would
> work if your data isn't changes over time (immutable). 
Yes, the data may change over time, also not very often. This is the 
biggest headache I have when designing solution using Hbase ;) : 
updating indexes.

> Also, have you considered IHBase and related secondary indexes
> implementations: the one from transactional contrib, lucene-hbase?
I have already looked well at solr and a bit at elasticsearch.
I was interested in Hbase because of the scaling capabilities.
My current system is beginning to show the limits of Postgresql. I could 
move the slow requests based on a SQL index to a Lucene based index, and 
then move to Hbase when the site gets bigger. Or I could invest the time 
now in a Hbase solution and do not use an intermediary (lucene based) 
stage. That's not decided yet.

I do not understand very well Hbase secondary indexes and what are the 
advantages with respect to hand made indexes. Does using Hbase secondary 
indexes help when the data is mutable? Or is that because indexes are 
created in a transaction making data consistency stronger?


View raw message