hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Benchmarking and improvement of HBase's performance for a common bulk data workload
Date Sat, 27 Apr 2013 09:14:08 GMT
Thanks for thinking about ways to optimize such workload.

You can start with the following when setting up your cluster:

For transactions, HBase is unique compared with PostgreSQL. See:


On Sat, Apr 27, 2013 at 1:20 PM, Atri Sharma <atri.jiit@gmail.com> wrote:

> Hi all,
> I have been discussing with Priyank sir on the following style of
> workload and whether we can improve HBase's performance in this area.
> The usecase is as follows:
> 1) Bulk load data.
> 2) Query the data multiple times(read access mostly, and no real time
> writes).
> This is a common workload, and I am pretty interested in benchmarking
> HBase's performance in this area, as well as improve this further.
> Please advice me on how I can proceed in benchmarking. Specifically,
> how will I need to set up a HBase cluster, will there be any specific
> requirements of the cluster for this type of testing?
> I worked on a patch to improve performance for a similar usecase in
> PostgreSQL. The case is pretty similar, bulk load of data, large
> number of mostly read only queries, and then deletion of the data.
> The optimization I targeted was the cost of writes to disk.
> Specifically, setting of flags(hint bits) for tracking the commt
> status of inserting/deleting transaction was causing a write overhead.
> I tried to mitigate this by making a cache which holds the transaction
> id in case of the above mentioned workload, hence mitigating the cost
> of writes.
> I will start benchmarking once I have the system set up and then start
> thinking of tests. Once I have an outline in my mind, I shall post it
> on the list.
> i will require the community's guidance in this a lot.
> Thoughts/Comments/Advice please?
> Regards,
> Atri
> --
> Regards,
> Atri
> l'apprenant

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message