kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Composite primary key
Date Tue, 05 Sep 2017 23:38:30 GMT
Hi Janne,

This is a good interesting question.

If you never plan on actually querying based on those columns themselves,
concatenating them into a binary column as the single PK will save a bit of
space relative to storing them separately. In the case of a composite
primary key, Kudu will internally encode a binary concatenated column and
store it using prefix encoding. So, if you store them separately, you'll
get the same composite binary encoding plus the additional storage for the
separate columns.

However, if you have any use case for querying based on them, having the
separate columns would be quite useful, since Kudu can push down predicates
to individual columns.

Being able to use the subfields for partitioning is also likely to be
useful - eg you might want to hash-partition on 'topic+partition' together
so that all data for a given topic always ends up stored together. This
wouldn't be possible if you use a combined (manually-encoded) key.


On Fri, Aug 25, 2017 at 11:10 PM, Janne Keskitalo <janne.keskitalo@paf.com>

> Hi
> We're inserting messages from kafka into kudu tables and some messages
> don't have a natural primary key, hence we decided to use kafka
> topic/partition/offset -combination as the key. Is it better to concatenate
> the fields into one kudu column or create a separate column for each? Do we
> get better compression if using individual columns? And is the PK index
> structure maintained outside of the actual table data?
> --
> Br.
> Janne Keskitalo,
> Database Architect, PAF.COM
> For support: dbdsupport@paf.com

Todd Lipcon
Software Engineer, Cloudera

View raw message