kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexey Serbin <aser...@cloudera.com>
Subject Re: [Benchmarking]
Date Tue, 14 Mar 2017 18:25:17 GMT

It seems that sort of benchmark is not a trivial undertaking.  I'm sure
there is a lot to consider while doing that sort of benchmark.  Probably,
more senior members of the Kudu team could suggest something else, but
right away I can suggest the following:

1. Consider using real hardware machines while doing the benchmark, not
VMs.  Make sure the databases store their data on the same media when doing
the comparison.

2. Make sure your benchmark schema is supported by both Kudu and
PostgreSQL.  Probably, to perform the benchmark you would need to tweak
your existing schema little bit.  Kudu supports a subset of types available
in PostreSQL.  Also, pay attention to primary keys/indices and partitions
if you running read/scan comparisons. Overall, in this context it's worth
reading this document first: https://kudu.apache.org/docs/schema_design.html

3. Kudu is supposed to shine when working with huge amount of data spread
across multiple machines in a cluster.  Are you about to use clustered
setup for PostgreSQL as well?  May be worth considering to try clustered
setup for PostgreSQL as well.

4. While creating Kudu tables, use just a single replica -- additional
replicas add some latency for write operations because the write operation
is considered successful only when by majority of existing replicas.  Also,
since I didn't see

5. Consider placing WAL for both Kudu and PostgreSQL on an SSD -- this
lowers latencies for DML operations.  I know that's so at least for Kudu,
and I would expect that's true for PostgreSQL as well.

6. Pay some attention to run-time resource limits in effect while running
those benchmarks:
  https://kudu.apache.org/docs/configuration_reference.html (search for
flags containing 'memory' and 'cache_size' in their names)

As for inserting your existing data into Kudu, consider using Impala:

Best regards,


On Tue, Mar 14, 2017 at 8:01 AM, paulo faria <zikoco2@hotmail.com> wrote:

> HI
> Im doing a benchmark of Kudu(and other timeseriesdbs) Versus PostgresQL
> 9.6.
> Done ur VM demo tutorial already.
> But now I would like to compare those 2. I already got the Postgresql
> enviroment set (with some tables + data (1GB per table to test)) on a
> remote server.
> 1)What is ur advice for a query(reads) performance compare?
> 2)Any way to convert(or migrate) the postgres structure to the Kudu? I got
> my database on HUE Impala so i can query over there and download the data
> also from there.
> Any tips are apreciated
> Best Regards
> Paulo Faria

View raw message