kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <...@cloudera.com>
Subject Re: User attributes schema design with Kudu
Date Sat, 16 Jan 2016 02:42:38 GMT
Hi Buntu,

This sounds like it could be a good usecase for Kudu with the right
schema.  I would model it as a table of events, with each event
corresponding to adding or removing an attribute from a user.  Something
like the following:

CREATE TABLE user_events ( user_id int64 NOT NULL,
time TIMESTAMP NOT NULL, attribute STRING NOT NULL,
value BOOL NOT NULL,
)
PRIMARY KEY (user_id, attribute, time)
DISTRIBUTE BY HASH (user_id) INTO 64 BUCKETS;

The key to getting good performance on the queries is designing an
efficient primary key and distribution strategy.  The schema above lays out
the data organized by user and attribute, and then by time.  This layout
will be efficient for the first and third queries, since it allows
substantial portions of the table to be ignored for individual queries.
For the second query it's not quite as efficient as if the table were laid
out by attribute and timestamp, but that query is touching a large portion
of the table anyway, so there is less opportunity for pruning.

- Dan


On Thu, Jan 14, 2016 at 10:16 PM, Buntu Dev <buntudev@gmail.com> wrote:

> I would like to know if Kudu is a good option for time series analysis for
> my use case which involves assigning attributes to a user dynamically based
> on the user actions and be able to answer these questions:
>
> * Does the given user have attribute X with value Y? and at given time t1.
> * Get list of users who had attribute X with value Y between timestamps t1
> and t2?
> * Get all the attributes of user at or around a given time t1.
>
> I read that Kudu needs a predefined schema but use case requires adding
> columns on the fly, for example, based on a payment transaction event I
> would like to add a 'is_payer' column on the fly setting it to true.
>
>
> Thanks for the input!
>

Mime
View raw message