cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nate McCall <>
Subject Re: Data Modeling: Partition Size and Query Efficiency
Date Tue, 05 Jan 2016 17:21:04 GMT
> In this case, 99% of my data could fit in a single 50 MB partition. But if
> I use the standard approach, I have to split my partitions into 50 pieces
> to accommodate the largest data. That means that to query the 700 rows for
> my median case, I have to read 50 partitions instead of one.
> If you try to deal with this by starting a new partition when an old one
> fills up, you have a nasty distributed consensus problem, along with
> read-before-write. Cassandra LWT wasn't available the last time I dealt
> with this, but might help with the consensus part today. But there are
> still some nasty corner cases.
> I have some thoughts on other ways to solve this, but they all have
> drawbacks. So I thought I'd ask here and hope that someone has a better
> approach.
Hi Jim - good to see you around again.

If you can segment this upstream by customer/account/whatever, handling the
outliers as an entirely different code path (potentially different cluster
as the workload will be quite different at that point and have different
tuning requirements) would be your best bet. Then a read-before-write makes
sense given it is happening on such a small number of API queries.

Nate McCall
Austin, TX

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting

View raw message