Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
To: user@cassandra.apache.org
From: Alain Rastoul <alf.mmm.cat@gmail.com>
Subject: Re: How can I scale my read rate?
Date: Mon, 20 Mar 2017 08:10:33 +0100
Lines: 44
Message-ID: <oanv93$7s1$1@blaine.gmane.org>
References: <CAGX0URgR=Xi478G-wre2bgaV5RqytHMODAounDE+xDj6OwsbLQ@mail.gmail.com>
 <9DF3C015-A617-4262-B501-E0B44195FB9C@gmail.com>
 <CAGX0URjLvWb6oQ8fbjO_jD78rM4PKcxZ56YH8vQFKtMPm1W+iA@mail.gmail.com>
 <CAOUOv0FUzmiFE6z+_7ZrH8NCPetboE8jvcsG--s2wD7Rg9yb4w@mail.gmail.com>
 <CACUnPaC=9HXW2i2n41TvqHC-YAQLr_wcafqr6o1gSA7frAkpaA@mail.gmail.com>
 <CAORswty1LbgCHDMsSHnch2uJ36ZEszBNYjZUQrGu=2mUfs+nUA@mail.gmail.com>
 <CAG_0GqsxaGU+T9_3shk2SBOJ1+67vvu6oO2CgbLMQddxsJt07A@mail.gmail.com>
 <CAGX0URj6i-Uq0LB__bziijY-HD72vODt=gOrYwtQC1Jr-02W1Q@mail.gmail.com>
 <CAG_0Gqvu5z_e+-3odnYt1diXpep-2eCRf3jUxWztJH4LxebD1g@mail.gmail.com>
 <58cda017.95d9540a.3833e.9256SMTPIN_ADDED_MISSING@mx.google.com>
 <CAGX0URgMvkBG33jc4iRysGFQorbVn1mZriNFkQqp3_NXQ-LX=A@mail.gmail.com>
 <CAGX0URgWSp6CW7swR-YeqR75HMzuVvpEqcjLeJ=f9=jX9eN3cw@mail.gmail.com>
 <CALznzY7wA8BTZE-PF7VjZ7hHUZ2Q0_zgdBFjGDK97nZe+EZK3w@mail.gmail.com>
 <CAGX0URgED=KGxJX2fezvzpR+2qR_5mrCqu9GgGdPD=--wL2Dnw@mail.gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.5.1
In-Reply-To: <CAGX0URgED=KGxJX2fezvzpR+2qR_5mrCqu9GgGdPD=--wL2Dnw@mail.gmail.com>
archived-at: Mon, 20 Mar 2017 07:11:09 -0000

On 20/03/2017 02:35, S G wrote:
> 2)
> https://docs.datastax.com/en/developer/java-driver/3.1/manual/statements/prepared/
> tells me to avoid preparing select queries if I expect a change of
> columns in my table down the road.
The problem is also related to select * which is considered bad practice 
with most databases...

> I did some more testing to see if my client machines were the bottleneck.
> For a 6-node Cassandra cluster (each VM having 8-cores), I got 26,000
> reads/sec for all of the following:
> 1) Client nodes:1, Threads: 60
> 2) Client nodes:3, Threads: 180
> 3) Client nodes:5, Threads: 300
> 4) Client nodes:10, Threads: 600
> 5) Client nodes:20, Threads: 1200
>
> So adding more client nodes or threads to those client nodes is not
> having any effect.
> I am suspecting Cassandra is simply not allowing me to go any further.
 > Primary keys for my schema are:
 >      PRIMARY KEY((name, phone), age)
 > name: text
 > phone: int
 > age: int

Yes with such a PK data must be spread on the whole cluster (also taking 
into account the partitioner), strange that the throughput doesn't scale.
I guess you also have verified that you select data randomly?

May be you could have a look at the system traces to see the query plan 
for some requests:
If you are on a test cluster you can truncate the tables before 
(truncate system_traces.sessions; and truncate system_traces.events;), 
run a test then select * from system_traces.events
where session_id = xxxx
xxx being one of the sessions you pick in trace.sessions.

Try to see if you are not always hitting the same nodes.


-- 
best,
Alain