cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Modeling Audit Trail on Cassandra
Date Wed, 16 Mar 2016 20:52:28 GMT
Stratio (or DSE Search) should be good for ad hoc or complex queries, but
if there are some fixed/common query patterns you might be better off
implementing query tables or using materialized views. The latter allows
you to include a non-PK data column in the PK of the MV so that you can
directly access the indexed row without the complexity of Lucene/DSE. This
also lets you effectively cluster data that will be commonly accessed
together on a single node/partition, and to do it automatically without any
application logic to manually duplicate/update data.

(3.x still has the restriction that an MV PK can only include one non-PK
data column - CASSANDRA-9928
<https://issues.apache.org/jira/browse/CASSANDRA-9928>.)

-- Jack Krupansky

On Wed, Mar 16, 2016 at 4:40 PM, I PVP <ipvp@hotmail.com> wrote:

> Jack/Tom
> Thanks for answering.
>
> Here is the table definition so far:
>
> CREATE TABLE audit_trail (
> auditid timeuuid,
> actiontype text,
> objecttype text,
> executedby uuid ( or timeuuid?),
> executedat timestamp,
> objectbefore text,
> objectafter text,
> clientipaddr text,
> serveripaddr text,
> servername text,
> channel text,
> PRIMARY KEY (auditid)
> );
>
> objectbefore/after are the only ones that will have JSON content. quering
> based on the contents of these two  columns are not a requirement.
>
> At this moment the queries are going to be mainly on executedby ( the
> employee id).
> Stratio’s Cassandra Lucene Index will be used to allow querying/filtering
> on executedat (timestamp) ,objecttype(order, customer, ticket,
> message,account, paymenttransaction,refund etc.)  and actiontype(create,
> retrieve, update, delete, approve, activate, unlock, lock etc.) .
>
> I am considering to count exclusively on Stratio’s Cassandra Lucene
>  filtering and avoid to add  “period” columns like month(int), year(int),
> day (int).
>
> Thanks
>
> --
> IPVP
>
>
> From: Jack Krupansky <jack.krupansky@gmail.com> <jack.krupansky@gmail.com>
> Reply: user@cassandra.apache.org <user@cassandra.apache.org>>
> <user@cassandra.apache.org>
> Date: March 16, 2016 at 5:22:36 PM
> To: user@cassandra.apache.org <user@cassandra.apache.org>>
> <user@cassandra.apache.org>
> Subject:  Re: Modeling Audit Trail on Cassandra
>
> executedby is the ID assigned to an employee.
>
> I'm presuming that JSON is to be used for objectbefore/after. This
> suggests no ability to query by individual object fields. I didn't sense
> any other columns that would be JSON.
>
>
>
> -- Jack Krupansky
>
> On Wed, Mar 16, 2016 at 3:48 PM, Tom van den Berge <tom@drillster.com>
> wrote:
>
>> Is text the most appropriate data type to store JSON that contain couple
>>> of dozen lines ?
>>>
>>
>> It sure is the simplest way to store JSON.
>>
>> The query requirement  is  "where executedby = ?”.
>>>
>>
>> Since executedby is a timeuuid, I guess you don't want to query a single
>> record, since that would require you to know the exact timeuuid. Do you
>> mean that you would like to query all changes in a certain time frame, e.g.
>> today? In that case, you would have to group your rows in time buckets,
>> e.g. PRIMARY KEY ((period), auditid). Period can be a day, month, or any
>> other period that suits your situation. Retrieving all changes in a
>> specific time frame is done by retrieving all relevant periods.
>>
>> Tom
>>
>
>

Mime
View raw message