cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Durity, Sean R" <SEAN_R_DUR...@homedepot.com>
Subject RE: [EXTERNAL] Re: Running select against cassandra
Date Thu, 06 Feb 2020 21:15:30 GMT
From reports on this mailing list, I do not allow materialized views.


Sean Durity

From: Reid Pinchback <rpinchback@tripadvisor.com>
Sent: Thursday, February 6, 2020 4:10 PM
To: user@cassandra.apache.org
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Abdul,

When in doubt, have a query model that immediately feeds you exactly what you are looking
for. That’s kind of the data model philosophy that you want to shoot for as much as feasible
with C*.

The point of Sean’s table isn’t the similarity to yours, it is how he has it keyed because
it suits a partition structure much better aligned with what you want to request.  So I’d
say yes, if a materialized view is how you want to achieve a denormalized state where the
query model directly supports giving you want you want to query for, that sounds like an appropriate
option to consider.  You might want a composite partition key for having an efficient selection
of narrow time ranges.

From: Abdul Patel <abd786.ap@gmail.com<mailto:abd786.ap@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Thursday, February 6, 2020 at 2:42 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: [EXTERNAL] Re: Running select against cassandra

Message from External Sender
this is the schema similar to what we have , they want to get user connected  - concurrent
count for every say 1-5 minutes.
i am thinking will simple select will have performance issue or we can go for materialized
views ?

CREATE TABLE  usr_session (
    userid bigint,
    session_usr text,
    last_access_time timestamp,
    login_time timestamp,
    status int,
    PRIMARY KEY (userid, session_usr)
) WITH CLUSTERING ORDER BY (session_usr ASC)


On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R <SEAN_R_DURITY@homedepot.com<mailto:SEAN_R_DURITY@homedepot.com>>
wrote:
Do you only need the current count or do you want to keep the historical counts also? By active
users, does that mean some kind of user that the application tracks (as opposed to the Cassandra
user connected to the cluster)?

I would consider a table like this for tracking active users through time:

Create table users_by_day (
app_date date,
hour integer,
minute integer,
user_count integer,
longest_login_user text,
longest_login_seconds integer,
last_login datetime,
last_login_user text )
primary key (app_date, hour, minute);

Then, your reporting can easily select full days or a specific, one-minute slice. Of course,
the app would need to have a timer and write out the data. I would also suggest a TTL on the
data so that you only keep what you need (a week, a year, whatever). Of course, if your reporting
requires different granularities, you could consider a different time bucket for the table
(by hour, by week, etc.)


Sean Durity – Staff Systems Engineer, Cassandra

From: Abdul Patel <abd786.ap@gmail.com<mailto:abd786.ap@gmail.com>>
Sent: Thursday, February 6, 2020 1:54 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: [EXTERNAL] Re: Running select against cassandra

Its sort of user connected, app team needa number of active users connected say  every 1 to
5 mins.
The timeout at app end is 120ms.



On Thursday, February 6, 2020, Michael Shuler <michael@pbandjelly.org<mailto:michael@pbandjelly.org>>
wrote:
You'll have to be more specific. What is your table schema and what is the SELECT query? What
is the normal response time?

As a basic guide for your general question, if the query is something sort of irrelevant that
should be stored some other way, like a total row count, or most any SELECT that requires
ALLOW FILTERING, you're doing it wrong and should re-evaluate your data model.

1 query per minute is a minuscule fraction of the basic capacity of queries per minute that
a Cassandra cluster should be able to handle with good data modeling and table-relevant query.
All depends on the data model and query.

Michael

On 2/6/20 12:20 PM, Abdul Patel wrote:
Hi,

Is it advisable to run select query to fetch every minute to grab data from cassandra for
reporting purpose, if no then whats the alternative?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org<mailto:user-unsubscribe@cassandra.apache.org>
For additional commands, e-mail: user-help@cassandra.apache.org<mailto:user-help@cassandra.apache.org>

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is
intended solely for the addressee. Access to this Email by anyone else is unauthorized. If
you are not the intended recipient, any disclosure, copying, distribution or any action taken
or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed
to our clients any opinions or advice contained in this Email are subject to the terms and
conditions expressed in any applicable governing The Home Depot terms of business or client
engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy
and content of this attachment and for any damages or losses arising from any inaccuracies,
errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature,
which may be contained in this attachment and shall not be liable for direct, indirect, consequential
or special damages in connection with this e-mail message or its attachment.

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is
intended solely for the addressee. Access to this Email by anyone else is unauthorized. If
you are not the intended recipient, any disclosure, copying, distribution or any action taken
or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed
to our clients any opinions or advice contained in this Email are subject to the terms and
conditions expressed in any applicable governing The Home Depot terms of business or client
engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy
and content of this attachment and for any damages or losses arising from any inaccuracies,
errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature,
which may be contained in this attachment and shall not be liable for direct, indirect, consequential
or special damages in connection with this e-mail message or its attachment.
Mime
View raw message