hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Baranov <alex.barano...@gmail.com>
Subject Re: random access and hotspots
Date Fri, 12 Mar 2010 11:29:40 GMT
_In case your only use-case is searching_ then you might think about Solr.
"few 10 of millions" documents can be handled by it gracefully. Solr has
also solutions for splitting index and replicating (for load balancing).
Another thing I'd suggest to consider is Lucandra (
http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/
).
There are also some movements around creating Lucene index in HBase (
http://www.search-hadoop.com/m?id=201003091546.40151.thomas@koch.ro).

Check http://wiki.apache.org/hadoop/DistributedLucene page for similar
solutions.

Alex Baranau

http://sematext.com
http://en.wordpress.com/tag/hadoop-ecosystem-digest/

On Thu, Mar 11, 2010 at 11:40 PM, TuX RaceR <tuxracer69@gmail.com> wrote:

> Hi Alex
>
> Thanks again for your detailed answer.
>
>
> Alex Baranov wrote:
>
>  So, 2 to 50 columns in each row. In case the single row size (in bytes) is
>>  not large then if requests load (number of concurrent clients which perform
>> described queries) is heavy, then you probably should consider simple data
>> duplication. I.e. rows with composite keys (which you've put in "Indexes"
>> table) will contain all data you need.
>>
>
> Yes, that is what I thought too after reading the typical read performance
> at:
>
>
> http://www.search-hadoop.com/m?id=7c962aed1001141446v467a295ctd86f0e8a3ef77596@mail.gmail.com
> Having to do 100 random access to generate just one web page would be too
> costly.
> Even more that the list pages pointing to the documents do not need to show
> the whole document content (just the information necessary to generate a
> link and maybe a short summary)
>
>
>
>  Given the fact the total count of all
>> rows would be 1-10bil this might work well for you. Of course this would
>> work if your data isn't changes over time (immutable).
>>
> Yes, the data may change over time, also not very often. This is the
> biggest headache I have when designing solution using Hbase ;) : updating
> indexes.
>
>
>  Also, have you considered IHBase and related secondary indexes
>> implementations: the one from transactional contrib, lucene-hbase?
>>
>>
> I have already looked well at solr and a bit at elasticsearch.
> I was interested in Hbase because of the scaling capabilities.
> My current system is beginning to show the limits of Postgresql. I could
> move the slow requests based on a SQL index to a Lucene based index, and
> then move to Hbase when the site gets bigger. Or I could invest the time now
> in a Hbase solution and do not use an intermediary (lucene based) stage.
> That's not decided yet.
>
> I do not understand very well Hbase secondary indexes and what are the
> advantages with respect to hand made indexes. Does using Hbase secondary
> indexes help when the data is mutable? Or is that because indexes are
> created in a transaction making data consistency stronger?
>
> Thanks
> TuX
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message