cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Janne Jalkanen <>
Subject Re: Read efficiency question
Date Fri, 30 Dec 2016 09:40:42 GMT
<html><head></head><body dir="auto" style="word-wrap: break-word; -webkit-nbsp-mode:
space; -webkit-line-break: after-white-space;"><meta http-equiv="Content-Type" content="text/html
charset=utf-8"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break:
after-white-space;" class=""><div class="">In practice, the performance you’re
getting is likely to be impacted by your reading patterns. &nbsp;If you do a lot of sequential
reads where key1 and key2 stay the same, and only key3 varies, then you may be getting better
peformance out of the second option due to hitting the row and disk caches more often. If
you are doing a lot of scatter reads, then you’re likely to get better performance out of
the first option, because the reads will be distributed more evenly to multiple nodes. &nbsp;It
also depends on how large rows you’re planning to use, as this will directly impact things
like compaction which has an overall impact of the entire cluster speed. &nbsp;For just
a few values of key3, I doubt there would be much difference in performance, but if key3 has
a cardinality of say, a million, you might be better off with option 1.</div><div
class=""><br class=""></div><div class="">As always the advice is - benchmark
your intended use case - put a few hundred gigs of mock data to a cluster, trigger compactions
and do perf tests for different kinds of read/write loads. :-)</div><div class=""><br
class=""></div><div class="">(Though if I didn’t know what my read pattern
would be, I’d probably go for option 1 purely on a gut feeling if I was sure I would never
need range queries on key3; shorter rows *usually* are a bit better for performance, compaction,
etc. &nbsp;Really wide rows can sometimes be a headache operationally.)</div><br
class=""><div class="">
<div class="">May you have energy and success!</div><div class="">/Janne</div><div
class=""><br class=""></div><br class="Apple-interchange-newline">


<br class=""><div><blockquote type="cite" class=""><div class="">On
28 Dec 2016, at 16:44, Manoj Khangaonkar &lt;<a href=""
class=""></a>&gt; wrote:</div><br class="Apple-interchange-newline"><div
class=""><div dir="ltr" class="">In the first case, the partitioning is based on
key1,key2,key3.<div class=""><br class=""></div><div class="">In the
second case, partitioning is based on key1 , key2. Additionally you have a clustered key key3.
This means within a partition you can do range queries on key3 efficiently. That is the difference.</div><div
class=""><br class=""></div><div class="">regards</div></div><div
class="gmail_extra"><br class=""><div class="gmail_quote">On Tue, Dec 27, 2016
at 7:42 AM, Voytek Jarnot <span dir="ltr" class="">&lt;<a href=""
target="_blank" class=""></a>&gt;</span> wrote:<br
class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc
solid;padding-left:1ex"><div dir="ltr" class="">Wondering if there's a difference
when querying by primary key between the two definitions below:<br class=""><br class="">primary
key ((key1, key2, key3))<div class="">primary key ((key1, key2), key3)</div><div
class=""><br class=""></div><div class="">In terms of read speed/efficiency...
I don't have much of a reason otherwise to prefer one setup over the other, so would prefer
the most efficient for querying.</div><div class=""><br class=""></div><div
</blockquote></div><br class=""><br clear="all" class=""><div class=""><br
class=""></div>-- <br class=""><div class="gmail_signature" data-smartmail="gmail_signature"><a
href="" target="_blank" class=""></a></div>
</div></blockquote></div><br class=""></div></body></html>
View raw message