cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naresh Yadav <nyadav....@gmail.com>
Subject Re: Query on Data Modelling of a specific usecase
Date Thu, 20 Apr 2017 08:22:48 GMT
Hi Jon,

Thanks for your guidance.

In above mentioned table i can have different scale depending on Report.

One report may have 10000 rows.
Second report may have half million rows.
Third report may have 1 million rows.
Fourth report may have 10 million rows.

As this is timeseries data that was main reason of modelling in cassandra.
We preferred separate table for each report as there is no usecase of
quering across reports and also Light reports will work faster.
I can plan to reduce no of tables drastically by combining lighter reports
in one table at application level.

If you can suggest optimal table design keeping one table in mind with 10
millions to 1 billion rows scale for the mentioned queries.

Thanks,
Naresh Yadav

On Wed, Apr 19, 2017 at 9:26 PM, Jon Haddad <jonathan.haddad@gmail.com>
wrote:

> How much data do you plan to store in each table?
>
> I’ll be honest, this doesn’t sound like a Cassandra use case at first
> glance.  1 table per report x 1000 is going to be a bad time.  Odds are
> with different queries, you’ll need multiple views, so lets call that a
> handful of tables per report.  Sounds to me like you need CSV (for small
> reports) or Parquet + a file system (for large ones).
>
> Jon
>
>
> On Apr 18, 2017, at 11:34 PM, Naresh Yadav <nyadav.ait@gmail.com> wrote:
>
> Looking for cassandra expert's recommendation on above usecase, please
> reply.
>
> On Mon, Apr 17, 2017 at 7:37 PM, Naresh Yadav <nyadav.ait@gmail.com>
> wrote:
>
>> Hi all,
>>
>> This is my existing table configured on apache-cassandra-3.0.9:
>>
>> CREATE TABLE report_id1 (
>>    mc_id text,
>>    tag_id text,
>>    e_date timestamp.
>>    value text
>>    PRIMARY KEY ((mc_id, tag_id), e_date)
>> }
>>
>> I create table dynamically for each report from application. Need to
>> support upto 1000 reports means 1000 such tables.
>> unique mc_id will be in range of 5 to 100 in a report.
>> For a mc_id there will be unique tag_id in range of 100 to 1 million in a
>> report.
>> For a mc_id, tag_id there will be unique e_date values in range of 10 to
>> 5000.
>>
>> Current queries to answer :
>> 1)SELECT * FROM report_id1 WHERE mc_id='x' AND tag_id IN('a','b','c') AND
>> e_date='16Apr2017 23:59:59';
>> 2)SELECT * FROM report_id1 WHERE mc_id='x' AND tag_id IN('a','b','c') AND
>> e_date >='01Apr2017 00:00:00' AND e_date <='16Apr2017 23:59:59;
>>
>> 3)SELECT * FROM report_id1 WHERE mc_id='x' AND e_date='16Apr2017
>> 23:59:59';
>>            Current design this works with ALLOW FILTERING ONLY
>> 4)SELECT * FROM report_id1 WHERE mc_id='x' AND e_date >='01Apr2017
>> 00:00:00' AND e_date <='16Apr2017 23:59:59';
>>            Current design this works with ALLOW FILTERING ONLY
>>
>> Looking for better design for this case, keeping in mind dynamic tables
>> usecase and queries listed.
>>
>> Thanks in advance,
>> Naresh
>>
>>
>
>

Mime
View raw message