hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Brown <tombrow...@gmail.com>
Subject Re: aggregation performance
Date Thu, 03 May 2012 15:01:37 GMT
For our solution we are doing some aggregation on the server via
coprocessors. In general, for each row there are 8 columns: 7 columns
that contain numbers (for summation) and 1 column that contains a
hyperloglog counter (about 700bytes). Functionally, this solution
works well and ought to scale with the number of region servers.
However, the individual request performance leaves a little to be
desired. What we've seen is that to scan 40000 rows (aggregated into
3000 rows) takes about 4 seconds.

Our code is in it's early stages (unoptimized) so we hope to see some
significant performance improvements when we run our coprocessor under
a profiler. Our benchmarks were on underpowered machines (only 2gb
RAM) as well.

Hope this helps!


On Thu, May 3, 2012 at 6:08 AM, Pere Ferrera <ferrerabertran@gmail.com> wrote:
> Hi,
> Is anybody benchmarking the performance of server-side aggregations through
> co-processors in HBase? I am interested to know if HBase could potentially
> be used to calculate real-time SQL-like aggregations at a good level of
> performance (q < 200ms on high-load, big dataset scenario). Just curious to
> know before I implement my own benchmarks.
> Pere.

View raw message