hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Baranov <alex.barano...@gmail.com>
Subject Re: random access and hotspots
Date Thu, 11 Mar 2010 19:21:22 GMT
>
> How many columns Random table would have?
>
> few 10 of millions (10^7)

> What is the row size?
>
> Rows will contain from two to 50 columns

You probably meant "few 10 of millions (10^7)" is a row count.

So, 2 to 50 columns in each row. In case the single row size (in bytes) is
not large then if requests load (number of concurrent clients which perform
described queries) is heavy, then you probably should consider simple data
duplication. I.e. rows with composite keys (which you've put in "Indexes"
table) will contain all data you need. Given the fact the total count of all
rows would be 1-10bil this might work well for you. Of course this would
work if your data isn't changes over time (immutable). If you have just a
few clients then keeping in mind the
> 100 is probably an upper bound
random queries should work fast enough.

Also, have you considered IHBase and related secondary indexes
implementations: the one from transactional contrib, lucene-hbase?

Alex Baranau

sematext.com
http://en.wordpress.com/tag/hadoop-ecosystem-digest/

On Thu, Mar 11, 2010 at 6:01 PM, TuX RaceR <tuxracer69@gmail.com> wrote:

> Hello Alex,
>
> Thank you for your mail.
>
>
> Alex Baranov wrote:
>
>> How many columns Random table would have?
>>
> few 10 of millions (10^7)
>
>
>  What is the row size?
>>
> Rows will contain from two to 50 columns
>
>  How many
>> rows are you going to fetch at one time (I assume just for displaying one
>> page with 10, 20, 100 records?)?
>>
>>
> Yes, that's correct: 100 is probably an upper bound.
>
>
>  How big is your data (estimated rows count)?
>>
>>
> few 10's of millions for Radom, 100 to 1000 times more for Indexes
>
>  How many different types of "indexes" are you planning to have?
>>
>>
>>
> around 50 Indexes
>
>  ...I need to fork concurrent threads/processes to get the document
>>>
>>>
>> details...
>>
>> Yes, increasing number of threads/processes will increase performance.
>>
>>
>>
>>> how many random search hbase will stand
>>>
>>>
>> This depends on your hardware. Have a look in MLs for some details shared
>> by
>> others, like:
>>
>> http://www.search-hadoop.com/m?id=7c962aed1001141446v467a295ctd86f0e8a3ef77596@mail.gmail.com
>>
>>
>>
>
>
> Thanks for the Links
> cheers
> TuX
>
>
>
>  one new socket (is that true?) is created at each random access request
>>>
>>>
>> No, but this (obviously) cause new ipc call (one can use Scan.get(int
>> count)
>> to fetch more rows at single call).
>>
>> Alex Baranau
>>
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message