cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdul Patel <abd786...@gmail.com>
Subject Re: Running select against cassandra
Date Fri, 07 Feb 2020 02:00:54 GMT
Thanks all for valuable inputs.
I agree we nees to have query defined then plan the schema of table , but
the server is live for 2 yrs now in production and this is new requiremnt
so changing schema is not a  option and secondary index is also bad idea.

I was thinking to go with materialized view or see how select perform in
non prod and see which fares better.
So wanted to see if we ca. Do anything other than that in existing schema.
Also copy option was discussed but copy doest support where clause.


On Thursday, February 6, 2020, Reid Pinchback <rpinchback@tripadvisor.com>
wrote:

> I defer to Sean’s comment on materialized views.  I’m more familiar with
> DynamoDB on that front, where you do this pretty routinely.  I was curious
> so I went looking. This appears to be the C* Jira that points to many of
> the problem points:
>
>
>
> https://issues.apache.org/jira/browse/CASSANDRA-13826
>
>
>
> Abdul, you’d probably want to refer to that or similar info.  Could be
> that the more practical resolution is to just have the client write the
> data twice, if there are two very different query patterns to support.
> Writes usually have quite low latency in C*, so double-writing may be less
> of a performance hit, and later drag on memory on I/O, than a query model
> that makes you browse through more data than necessary.
>
>
>
> *From: *"Durity, Sean R" <SEAN_R_DURITY@homedepot.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Thursday, February 6, 2020 at 4:24 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *RE: [EXTERNAL] Re: Running select against cassandra
>
>
>
> *Message from External Sender*
>
> Reid is right. You build the tables to easily answer the queries you want.
> So, start with the query! I inferred a query for you based on what you
> mentioned. If my inference is wrong, the table structure is likely wrong,
> too.
>
>
>
> So, what kind of query do you want to run?
>
>
>
> (NOTE: a select count(*) that is not restricted to within a single
> partition is a very bad option. Don’t do that)
>
>
>
> The query for my table below is simply:
>
> select user_count [, other columns] from users_by_day where date = ? and
> hour = ? and minute = ?
>
>
>
>
>
> Sean Durity
>
>
>
> *From:* Reid Pinchback <rpinchback@tripadvisor.com>
> *Sent:* Thursday, February 6, 2020 4:10 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: [EXTERNAL] Re: Running select against cassandra
>
>
>
> Abdul,
>
>
>
> When in doubt, have a query model that immediately feeds you exactly what
> you are looking for. That’s kind of the data model philosophy that you want
> to shoot for as much as feasible with C*.
>
>
>
> The point of Sean’s table isn’t the similarity to yours, it is how he has
> it keyed because it suits a partition structure much better aligned with
> what you want to request.  So I’d say yes, if a materialized view is how
> you want to achieve a denormalized state where the query model directly
> supports giving you want you want to query for, that sounds like an
> appropriate option to consider.  You might want a composite partition key
> for having an efficient selection of narrow time ranges.
>
>
>
> *From: *Abdul Patel <abd786.ap@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Thursday, February 6, 2020 at 2:42 PM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: [EXTERNAL] Re: Running select against cassandra
>
>
>
> *Message from External Sender*
>
> this is the schema similar to what we have , they want to get user
> connected  - concurrent count for every say 1-5 minutes.
>
> i am thinking will simple select will have performance issue or we can go
> for materialized views ?
>
>
>
> CREATE TABLE  usr_session (
>
>     userid bigint,
>
>     session_usr text,
>
>     last_access_time timestamp,
>
>     login_time timestamp,
>
>     status int,
>
>     PRIMARY KEY (userid, session_usr)
>
> ) WITH CLUSTERING ORDER BY (session_usr ASC)
>
>
>
>
>
> On Thu, Feb 6, 2020 at 2:09 PM Durity, Sean R <SEAN_R_DURITY@homedepot.com>
> wrote:
>
> Do you only need the current count or do you want to keep the historical
> counts also? By active users, does that mean some kind of user that the
> application tracks (as opposed to the Cassandra user connected to the
> cluster)?
>
>
>
> I would consider a table like this for tracking active users through time:
>
>
>
> Create table users_by_day (
>
> app_date date,
>
> hour integer,
>
> minute integer,
>
> user_count integer,
>
> longest_login_user text,
>
> longest_login_seconds integer,
>
> last_login datetime,
>
> last_login_user text )
>
> primary key (app_date, hour, minute);
>
>
>
> Then, your reporting can easily select full days or a specific, one-minute
> slice. Of course, the app would need to have a timer and write out the
> data. I would also suggest a TTL on the data so that you only keep what you
> need (a week, a year, whatever). Of course, if your reporting requires
> different granularities, you could consider a different time bucket for the
> table (by hour, by week, etc.)
>
>
>
>
>
> Sean Durity – Staff Systems Engineer, Cassandra
>
>
>
> *From:* Abdul Patel <abd786.ap@gmail.com>
> *Sent:* Thursday, February 6, 2020 1:54 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Running select against cassandra
>
>
>
> Its sort of user connected, app team needa number of active users
> connected say  every 1 to 5 mins.
>
> The timeout at app end is 120ms.
>
>
>
>
>
> On Thursday, February 6, 2020, Michael Shuler <michael@pbandjelly.org>
> wrote:
>
> You'll have to be more specific. What is your table schema and what is the
> SELECT query? What is the normal response time?
>
> As a basic guide for your general question, if the query is something sort
> of irrelevant that should be stored some other way, like a total row count,
> or most any SELECT that requires ALLOW FILTERING, you're doing it wrong and
> should re-evaluate your data model.
>
> 1 query per minute is a minuscule fraction of the basic capacity of
> queries per minute that a Cassandra cluster should be able to handle with
> good data modeling and table-relevant query. All depends on the data model
> and query.
>
> Michael
>
> On 2/6/20 12:20 PM, Abdul Patel wrote:
>
> Hi,
>
> Is it advisable to run select query to fetch every minute to grab data
> from cassandra for reporting purpose, if no then whats the alternative?
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>
> ------------------------------
>
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
>
> ------------------------------
>
>
> The information in this Internet Email is confidential and may be legally
> privileged. It is intended solely for the addressee. Access to this Email
> by anyone else is unauthorized. If you are not the intended recipient, any
> disclosure, copying, distribution or any action taken or omitted to be
> taken in reliance on it, is prohibited and may be unlawful. When addressed
> to our clients any opinions or advice contained in this Email are subject
> to the terms and conditions expressed in any applicable governing The Home
> Depot terms of business or client engagement letter. The Home Depot
> disclaims all responsibility and liability for the accuracy and content of
> this attachment and for any damages or losses arising from any
> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
> items of a destructive nature, which may be contained in this attachment
> and shall not be liable for direct, indirect, consequential or special
> damages in connection with this e-mail message or its attachment.
>
>

Mime
View raw message