hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Stepachev <oct...@gmail.com>
Subject Re: HBase secondary index performance
Date Mon, 06 Sep 2010 18:46:00 GMT
2010/9/6 Murali Krishna. P <muralikpbhat@yahoo.com>:
> Hi,
>   My row size is around 300 bytes with total 20 columns. I tried the custom
> indexing without the write to WAL. Currently having only 2 tables, one for the
> main table and another for all 20 indexes. My key to the index table is
> columnValue+columnName+rowKey.

As mentioned before, you can randomize you index insertions.
If you don't order scan or range scan on columnValue, you can
prefix it with some hash, f.e. sha(columnValue) + columnValue +
columnName + rowKey.
This remove hotspot in one of your region servers.

> I am getting around 500 inserts/second now. (ie, total of ~10K puts). This is
> probably comparable with your numbers based on the data size.
Are all region servers get equal load, or some servers are more busy,
then others?

>  I have some doubts with the hbase write implementation.
> * Is this the max that we can achieve with any number of region servers? Why
> adding region servers not improving the write performance? Is it because when
> the data doesn't exist in the table, it always writes to one region ?
In general - yes. Before tables splits, you will get all writes into
one region server.

> * Probably writing to an existing, well distributed table might give better
> performance since the writes will be across machines ? In that case, if we have
> multiple tables (one per index), will it be better during the initial write
> itself (since each table has different region) ??
More servers affect the recording, the better.

 Andrey.

Mime
View raw message