cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Kania <>
Subject Re: Strategy for dividing wide rows beyond just adding to the partition key
Date Sat, 12 Mar 2016 15:22:36 GMT
Hi Carlos,
Thanks for the suggestions.
We are having partition size issues and that was why we started to do custom sharding/partition
division based on time. As you mentioned, we are having problems with identification. Its
the identification of shard range that we need to understand and our data doesn't necessarily
run until the current time. My worry with storing that last shard id in another table is that
we would update the same row in that table all the time creating tombstones.
It is good to know that returning empty partitions is not that costly as that is a concern
when we don't know where to start and end.

      From: Carlos Alonso <>
 To: "" <> 
 Sent: Friday, March 11, 2016 7:24 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the partition key
Hi Jason,
If I understand correctly you have no problems with the size of your partitions or transactional
queries but with the 'identification' of them when having to do analytical queries.
I'd then suggest two options:1. Keep using Cassandra and store the first 'bucket' of each
sensor in a separate table to use as the starting point of your full scan queries. Then issue
async queries incrementing the bucket until today (logical end of the data). Cassandra is
very efficient at returning empty partitions, so querying on empty buckets is normally fine.
2. Periodically offload your 'historic' data to another storage more appropriate for analytics (Parquet
+ S3) and query it using Spark.
Hope it helps
On Saturday, 12 March 2016, Jack Krupansky <> wrote:

Thanks for the additional information, but there is still not enough color on the queries
and too much focus on a premature data model.
Is this 5000 readings for a single sensor of a single sensor unit, or for all sensors of a
specified unit, or... both?
I presume you want "next" and "previous" 5000 readings as well as first and last, but... you
will have to confirm that.
One technique is to store the bulk of your raw sensor data in a separate table and then simply
store the PK of that data in your time series. That way you can have a much wider row of time
series (number of rows) without hitting a bulk size issue for the partition. But... I don't
want to jump to solutions until we have a firmer handle on the query side of the fence.
-- Jack Krupansky
On Fri, Mar 11, 2016 at 5:37 PM, Jason Kania <> wrote:

Thanks for the response.
We are targeting our database design to 10000 sensor units and each sensor unit has 32 sensors.
We are seeing about 700 events per day per sensor, each providing about 2K of data. Based
on keeping each partition to about 10 Mb (based on readings we saw on performance), we chose
to break our partitions on a weekly basis. This is possibly finer than we need as we were
seeing timeouts only once a single partition was about 150Mb in size

When pulling in data, we will typically need to pull 1 to 4 months of data for our analysis
and will use only the sensorUnitId and sensorId to uniquely identify the data source with
the timeShard value used to break up our partitions. We have handling to sequentially scan
based on our "timeShard" value, but don't have a good handle on the determination of the "timeShard"
portion of the partition key at read time. The data starts coming in when a subscriber starts
using our system and finishes when they discontinue service or put the service on hold temporarily.

When I talk about hotspots, it isn't the time series data that is the concern, it is with
respect to storing the maximum and minimum timeShard values in another table for subsequent
lookup or the cost of running the current implementation of SELECT DISTINCT. We need to run
queries such as getting the first or last 5000 sensor readings when we don't know the time
frame at which they occurred so cannot directly supply the timeShard portion of our partition

I appreciate your input,

      From: Jack Krupansky <>
 To: "" <> 
 Sent: Friday, March 11, 2016 4:45 PM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the partition key
I'll stay away from advising on a specific schema per se, but I'll stick to the advice that
you need to make sure that your queries are depending solely on the columns of the primary
key or relatively short slices/scans, rather than run the risk of very long scans or having
to process multiple partitions for a single query. That's canned to some extent, but still
Of course we generally wish to avoid hotspots, but with time series they are unavoidable.
I mean, sure you could place successive events at separate partitions, but then you can't
do any kind of scanning/slicing.
But, events for separate sensors are not true hotspots in the traditional sense - unless you
have only a single sensor/unit.
After considering your queries, the next step is to consider the cardinality of your data
- how many sensors, how many units, rate of events, etc. That will feedback into queries as
well, such as how big a slice or scan might be, as well as sizing of partitions.
So, how many sensor units do you expect, how many sensors per unit, and expected rate of events
per sensor?
Try not to jump too quickly to specific solutions - there really is a method to understanding
all of this other stuff upfront.
-- Jack Krupansky
On Thu, Mar 10, 2016 at 12:39 PM, Jason Kania <> wrote:

Thanks for the response. I don't think I provided enough information and used the wrong terminology
as your response is more the canned advice is response to Cassandra antipatterns.
To make this clearer, this is what we are doing:
create table sensorReadings (sensorUnitId int,
sensorId int,time timestamp,timeShard int,
readings blob,primary key((sensorUnitId, sensorId, timeShard), time);
where timeShard is a combination of year and week of year
For known time range based queries, this works great. However, the specific problem is in
knowing the maximum and minimum timeShard values when we want to select the entire range of
data. Our understanding is that if we update another related table with the maximum and minimum
timeShard value for a given sensorUnitId and sensorId combination, we will create a hotspot
and lots of tombstones. If we SELECT DISTINCT, we get a huge list of partition keys for the
table because we cannot reduce the scope with a where clause.

If there is a recommended pattern that solves this, we haven't come across it.

I hope makes the problem clearer.

      From: Jack Krupansky <>
 To:; Jason Kania <> 
 Sent: Thursday, March 10, 2016 10:42 AM
 Subject: Re: Strategy for dividing wide rows beyond just adding to the partition key
There is an effort underway to support wider rows:

This won't help you now though. Even with that improvement you still may need a more optimal
data model since large-scale scanning/filtering is always a very bad idea with Cassandra.
The data modeling methodology for Cassandra dictates that queries drive the data model and
that each form of query requires a separate table ("query table.") Materialized view can automate
that process for a lot of cases, but in any case it does sound as if some of your queries
do require additional tables.
As a general proposition, Cassandra should not be used for heavy filtering - query tables
with the filtering criteria baked into the PK is the way to go.

-- Jack Krupansky
On Thu, Mar 10, 2016 at 8:54 AM, Jason Kania <> wrote:

We have sensor input that creates very wide rows and operations on these rows have started
to timeout regulary. We have been trying to find a solution to dividing wide rows but keep
hitting limitations that move the problem around instead of solving it.
We have a partition key consisting of a sensorUnitId and a sensorId and use a time field to
access each column in the row. We tried adding a time based entry, timeShardId, to the partition
key that consists of the year and week of year during which the reading was taken. This works
for a number of queries but for scanning all the readings against a particular sensorUnitId
and sensorId combination, we seem to be stuck.
We won't know the range of valid values of the timeShardId for a given sensorUnitId and sensorId
combination so would have to write to an additional table to track the valid timeShardId.
We suspect this would create tombstone accumulation problems given the number of updates required
to the same row so haven't tried this option.

Alternatively, we hit a different bottleneck in the form of SELECT DISTINCT in trying to directly
access the partition keys. Since SELECT DISTINCT does not allow for a where clause to filter
on the partition key values, we have to filter several hundred thousand partition keys just
to find those related to the relevant sensorUnitId and sensorId. This problem will only grow
worse for us.

Are there any other approaches that can be suggested? We have been looking around, but haven't
found any references beyond the initial suggestion to add some sort of shard id to the partition
key to handle wide rows.



Carlos Alonso | Software Engineer | @calonso

View raw message