cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Pederson <eric...@gmail.com>
Subject Re: Bottleneck for small inserts?
Date Fri, 26 May 2017 00:09:34 GMT
Due to a cut and paste error those flamegraphs were a recording of the
whole system, not just Cassandra.    Throughput is approximately 30k
rows/sec.

Here's the graphs with just the Cassandra PID:

   -
   http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva01_sars2.svg
   -
   http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva02_sars2.svg
   -
   http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva03_sars2.svg


And here's graphs during a cqlsh COPY FROM to the same table, using real
data, MAXBATCHSIZE=2.    Throughput is good at approximately 110k rows/sec.

   -
   http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva01_cars_batch2.svg
   -
   http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva02_cars_batch2.svg
   -
   http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva03_cars_batch2.svg




-- Eric

On Thu, May 25, 2017 at 6:44 PM, Eric Pederson <ericacm@gmail.com> wrote:

> Totally understood :)
>
> I forgot to mention - I set the /proc/irq/*/smp_affinity mask to include
> all of the CPUs.  Actually most of them were set that way already (for
> example, 0000ffff,ffffffff) - it might be because irqbalanced is
> running.  But for some reason the interrupts are all being handled on CPU 0
> anyway.
>
> I see this in /var/log/dmesg on the machines:
>
>>
>> Your BIOS has requested that x2apic be disabled.
>> This will leave your machine vulnerable to irq-injection attacks.
>> Use 'intremap=no_x2apic_optout' to override BIOS request.
>> Enabled IRQ remapping in xapic mode
>> x2apic not enabled, IRQ remapping is in xapic mode
>
>
> In a reply to one of the comments, he says:
>
>
> When IO-APIC configured to spread interrupts among all cores, it can
>> handle up to eight cores. If you have more than eight cores, kernel will
>> not configure IO-APIC to spread interrupts. Thus the trick I described in
>> the article will not work.
>> Otherwise it may be caused by buggy BIOS or even buggy hardware.
>
>
> I'm not sure if either of them is relevant to my situation.
>
>
> Thanks!
>
>
>
>
>
> -- Eric
>
> On Thu, May 25, 2017 at 4:16 PM, Jonathan Haddad <jon@jonhaddad.com>
> wrote:
>
>> You shouldn't need a kernel recompile.  Check out the section "Simple
>> solution for the problem" in http://www.alexonlinux.com/
>> smp-affinity-and-proper-interrupt-handling-in-linux.  You can balance
>> your requests across up to 8 CPUs.
>>
>> I'll check out the flame graphs in a little bit - in the middle of
>> something and my brain doesn't multitask well :)
>>
>> On Thu, May 25, 2017 at 1:06 PM Eric Pederson <ericacm@gmail.com> wrote:
>>
>>> Hi Jonathan -
>>>
>>> It looks like these machines are configured to use CPU 0 for all I/O
>>> interrupts.  I don't think I'm going to get the OK to compile a new kernel
>>> for them to balance the interrupts across CPUs, but to mitigate the problem
>>> I taskset the Cassandra process to run on all CPU except 0.  It didn't
>>> change the performance though.  Let me know if you think it's crucial that
>>> we balance the interrupts across CPUs and I can try to lobby for a new
>>> kernel.
>>>
>>> Here are flamegraphs from each node from a cassandra-stress ingest into
>>> a table representative of the what we are going to be using.   This table
>>> is also roughly 200 bytes, with 64 columns and the primary key (date,
>>> sequence_number).  Cassandra-stress was run on 3 separate client
>>> machines.  Using cassandra-stress to write to this table I see the same
>>> thing: neither disk, CPU or network is fully utilized.
>>>
>>>    - http://sourcedelica.com/wordpress/wp-content/uploads/2017/
>>>    05/flamegraph_ultva01_sars.svg
>>>    - http://sourcedelica.com/wordpress/wp-content/uploads/2017/
>>>    05/flamegraph_ultva02_sars.svg
>>>    - http://sourcedelica.com/wordpress/wp-content/uploads/2017/
>>>    05/flamegraph_ultva03_sars.svg
>>>
>>> Re: GC: In the stress run with the parameters above, two of the three
>>> nodes log zero or one GCInspectors.  On the other hand, the 3rd machine
>>> logs a GCInspector every 5 seconds or so, 300-500ms each time.  I found
>>> out that the 3rd machine actually has different specs as the other two.
>>> It's an older box with the same RAM but less CPUs (32 instead of 48), a
>>> slower SSD and slower memory.   The Cassandra configuration is exactly the
>>> same.   I tried running Cassandra with only 32 CPUs on the newer boxes to
>>> see if that would cause them to GC pause more, but it didn't.
>>>
>>> On a separate topic - for this cassandra-stress run I reduced the batch
>>> size to 2 in order to keep the logs clean.  That also reduced the
>>> throughput from around 100k rows/second to 32k rows/sec.  I've been doing
>>> ingestion tests using cassandra-stress, cqlsh COPY FROM and a custom
>>> C++ application.  In most of the tests that I've been doing I've been using
>>> a batch size of around 20 (unlogged, all batch rows have the same partition
>>> key).  However, it fills the logs with batch size warnings.  I was going to
>>> raise the batch warning size but the docs scared me away from doing that.
>>> Given that we're using unlogged/same partition batches is it safe to raise
>>> the batch size warning limit?   Actually cqlsh COPY FROM has very good
>>> throughput using a small batch size, but I can't get that same throughput
>>> in cassandra-stress or my C++ app with a batch size of 2.
>>>
>>> Thanks!
>>>
>>>
>>>
>>> -- Eric
>>>
>>> On Mon, May 22, 2017 at 5:08 PM, Jonathan Haddad <jon@jonhaddad.com>
>>> wrote:
>>>
>>>> How many CPUs are you using for interrupts?
>>>> http://www.alexonlinux.com/smp-affinity-and-proper-interrup
>>>> t-handling-in-linux
>>>>
>>>> Have you tried making a flame graph to see where Cassandra is spending
>>>> its time? http://www.brendangregg.com/blog/2014-06-12/java-flame
>>>> -graphs.html
>>>>
>>>> Are you tracking GC pauses?
>>>>
>>>> Jon
>>>>
>>>> On Mon, May 22, 2017 at 2:03 PM Eric Pederson <ericacm@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all:
>>>>>
>>>>> I'm new to Cassandra and I'm doing some performance testing.  One of
>>>>> things that I'm testing is ingestion throughput.   My server setup is:
>>>>>
>>>>>    - 3 node cluster
>>>>>    - SSD data (both commit log and sstables are on the same disk)
>>>>>    - 64 GB RAM per server
>>>>>    - 48 cores per server
>>>>>    - Cassandra 3.0.11
>>>>>    - 48 Gb heap using G1GC
>>>>>    - 1 Gbps NICs
>>>>>
>>>>> Since I'm using SSD I've tried tuning the following (one at a time)
>>>>> but none seemed to make a lot of difference:
>>>>>
>>>>>    - concurrent_writes=384
>>>>>    - memtable_flush_writers=8
>>>>>    - concurrent_compactors=8
>>>>>
>>>>> I am currently doing ingestion tests sending data from 3 clients on
>>>>> the same subnet.  I am using cassandra-stress to do some ingestion
>>>>> testing.  The tests are using CL=ONE and RF=2.
>>>>>
>>>>> Using cassandra-stress (3.10) I am able to saturate the disk using a
>>>>> large enough column size and the standard five column cassandra-stress
>>>>> schema.  For example, -col size=fixed(400) will saturate the disk and
>>>>> compactions will start falling behind.
>>>>>
>>>>> One of our main tables has a row size that approximately 200 bytes,
>>>>> across 64 columns.  When ingesting this table I don't see any resource
>>>>> saturation.  Disk utilization is around 10-15% per iostat.  Incoming
>>>>> network traffic on the servers is around 100-300 Mbps.  CPU utilization
is
>>>>> around 20-70%.  nodetool tpstats shows mostly zeros with occasional
>>>>> spikes around 500 in MutationStage.
>>>>>
>>>>> The stress run does 10,000,000 inserts per client, each with a
>>>>> separate range of partition IDs.  The run with 200 byte rows takes about
4
>>>>> minutes, with mean Latency 4.5ms, Total GC time of 21 secs, Avg GC time
173
>>>>> ms.
>>>>>
>>>>> The overall performance is good - around 120k rows/sec ingested.  But
>>>>> I'm curious to know where the bottleneck is.  There's no resource
>>>>> saturation and nodetool tpstats shows only occasional brief
>>>>> queueing.  Is the rest just expected latency inside of Cassandra?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -- Eric
>>>>>
>>>>
>>>
>

Mime
View raw message