cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Haddad <...@jonhaddad.com>
Subject Re: Is my cluster normal?
Date Tue, 12 Jul 2016 19:33:06 GMT
Can do you do:

iostat -dmx 2 10



On Tue, Jul 12, 2016 at 11:20 AM Yuan Fang <yuan@kryptoncloud.com> wrote:

> Hi Jeff,
>
> The read being low is because we do not have much read operations right
> now.
>
> The heap is only 4GB.
>
> MAX_HEAP_SIZE=4GB
>
> On Thu, Jul 7, 2016 at 7:17 PM, Jeff Jirsa <jeff.jirsa@crowdstrike.com>
> wrote:
>
>> EBS iops scale with volume size.
>>
>>
>>
>> A 600G EBS volume only guarantees 1800 iops – if you’re exhausting those
>> on writes, you’re going to suffer on reads.
>>
>>
>>
>> You have a 16G server, and probably a good chunk of that allocated to
>> heap. Consequently, you have almost no page cache, so your reads are going
>> to hit the disk. Your reads being very low is not uncommon if you have no
>> page cache – the default settings for Cassandra (64k compression chunks)
>> are really inefficient for small reads served off of disk. If you drop the
>> compression chunk size (4k, for example), you’ll probably see your read
>> throughput increase significantly, which will give you more iops for
>> commitlog, so write throughput likely goes up, too.
>>
>>
>>
>>
>>
>>
>>
>> *From: *Jonathan Haddad <jon@jonhaddad.com>
>> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Date: *Thursday, July 7, 2016 at 6:54 PM
>> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
>> *Subject: *Re: Is my cluster normal?
>>
>>
>>
>> What's your CPU looking like? If it's low, check your IO with iostat or
>> dstat. I know some people have used Ebs and say it's fine but ive been
>> burned too many times.
>>
>> On Thu, Jul 7, 2016 at 6:12 PM Yuan Fang <yuan@kryptoncloud.com> wrote:
>>
>> Hi Riccardo,
>>
>>
>>
>> Very low IO-wait. About 0.3%.
>>
>> No stolen CPU. It is a casssandra only instance. I did not see any
>> dropped messages.
>>
>>
>>
>>
>>
>> ubuntu@cassandra1:/mnt/data$ nodetool tpstats
>>
>> Pool Name                    Active   Pending      Completed   Blocked
>>  All time blocked
>>
>> MutationStage                     1         1      929509244         0
>>               0
>>
>> ViewMutationStage                 0         0              0         0
>>               0
>>
>> ReadStage                         4         0        4021570         0
>>               0
>>
>> RequestResponseStage              0         0      731477999         0
>>               0
>>
>> ReadRepairStage                   0         0         165603         0
>>               0
>>
>> CounterMutationStage              0         0              0         0
>>               0
>>
>> MiscStage                         0         0              0         0
>>               0
>>
>> CompactionExecutor                2        55          92022         0
>>               0
>>
>> MemtableReclaimMemory             0         0           1736         0
>>               0
>>
>> PendingRangeCalculator            0         0              6         0
>>               0
>>
>> GossipStage                       0         0         345474         0
>>               0
>>
>> SecondaryIndexManagement          0         0              0         0
>>               0
>>
>> HintsDispatcher                   0         0              4         0
>>               0
>>
>> MigrationStage                    0         0             35         0
>>               0
>>
>> MemtablePostFlush                 0         0           1973         0
>>               0
>>
>> ValidationExecutor                0         0              0         0
>>               0
>>
>> Sampler                           0         0              0         0
>>               0
>>
>> MemtableFlushWriter               0         0           1736         0
>>               0
>>
>> InternalResponseStage             0         0           5311         0
>>               0
>>
>> AntiEntropyStage                  0         0              0         0
>>               0
>>
>> CacheCleanupExecutor              0         0              0         0
>>               0
>>
>> Native-Transport-Requests       128       128      347508531         2
>>        15891862
>>
>>
>>
>> Message type           Dropped
>>
>> READ                         0
>>
>> RANGE_SLICE                  0
>>
>> _TRACE                       0
>>
>> HINT                         0
>>
>> MUTATION                     0
>>
>> COUNTER_MUTATION             0
>>
>> BATCH_STORE                  0
>>
>> BATCH_REMOVE                 0
>>
>> REQUEST_RESPONSE             0
>>
>> PAGED_RANGE                  0
>>
>> READ_REPAIR                  0
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 7, 2016 at 5:24 PM, Riccardo Ferrari <ferrarir@gmail.com>
>> wrote:
>>
>> Hi Yuan,
>>
>>
>>
>> You machine instance is 4 vcpus that is 4 threads (not cores!!!), aside
>> from any Cassandra specific discussion a system load of 10 on a 4 threads
>> machine is way too much in my opinion. If that is the running average
>> system load I would look deeper into system details. Is that IO wait? Is
>> that CPU Stolen? Is that a Cassandra only instance or are there other
>> processes pushing the load?
>>
>> What does your "nodetool tpstats" say? Hoe many dropped messages do you
>> have?
>>
>>
>>
>> Best,
>>
>>
>>
>> On Fri, Jul 8, 2016 at 12:34 AM, Yuan Fang <yuan@kryptoncloud.com> wrote:
>>
>> Thanks Ben! For the post, it seems they got a little better but similar
>> result than i did. Good to know it.
>>
>> I am not sure if a little fine tuning of heap memory will help or not.
>>
>>
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:58 PM, Ben Slater <ben.slater@instaclustr.com>
>> wrote:
>>
>> Hi Yuan,
>>
>>
>>
>> You might find this blog post a useful comparison:
>>
>>
>> https://www.instaclustr.com/blog/2016/01/07/multi-data-center-apache-spark-and-apache-cassandra-benchmark/
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.instaclustr.com_blog_2016_01_07_multi-2Ddata-2Dcenter-2Dapache-2Dspark-2Dand-2Dapache-2Dcassandra-2Dbenchmark_&d=CwMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=MU4-NWBjvVO95HnxQtkYk4xkApq4X4IiVy8tPCgj4KU&e=>
>>
>>
>>
>> Although the focus is on Spark and Cassandra and multi-DC there are also
>> some single DC benchmarks of m4.xl
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__m4.xl&d=CwQFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=Ltg5YUTZbI4Ixf7UjzKW636Llz6zXXurTveCLptZwio&s=m3DfZk3YOaf0W2OvACsqDWXp-vdlkP-cC0WnEouZwkk&e=>
>> clusters plus some discussion of how we went about benchmarking.
>>
>>
>>
>> Cheers
>>
>> Ben
>>
>>
>>
>>
>>
>> On Fri, 8 Jul 2016 at 07:52 Yuan Fang <yuan@kryptoncloud.com> wrote:
>>
>> Yes, here is my stress test result:
>>
>> Results:
>>
>> op rate                   : 12200 [WRITE:12200]
>>
>> partition rate            : 12200 [WRITE:12200]
>>
>> row rate                  : 12200 [WRITE:12200]
>>
>> latency mean              : 16.4 [WRITE:16.4]
>>
>> latency median            : 7.1 [WRITE:7.1]
>>
>> latency 95th percentile   : 38.1 [WRITE:38.1]
>>
>> latency 99th percentile   : 204.3 [WRITE:204.3]
>>
>> latency 99.9th percentile : 465.9 [WRITE:465.9]
>>
>> latency max               : 1408.4 [WRITE:1408.4]
>>
>> Total partitions          : 1000000 [WRITE:1000000]
>>
>> Total errors              : 0 [WRITE:0]
>>
>> total gc count            : 0
>>
>> total gc mb               : 0
>>
>> total gc time (s)         : 0
>>
>> avg gc time(ms)           : NaN
>>
>> stdev gc time(ms)         : 0
>>
>> Total operation time      : 00:01:21
>>
>> END
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:49 PM, Ryan Svihla <rs@foundev.pro> wrote:
>>
>> Lots of variables you're leaving out.
>>
>>
>>
>> Depends on write size, if you're using logged batch or not, what
>> consistency level, what RF, if the writes come in bursts, etc, etc.
>> However, that's all sort of moot for determining "normal" really you need a
>> baseline as all those variables end up mattering a huge amount.
>>
>>
>>
>> I would suggest using Cassandra stress as a baseline and go from there
>> depending on what those numbers say (just pick the defaults).
>>
>> Sent from my iPhone
>>
>>
>> On Jul 7, 2016, at 4:39 PM, Yuan Fang <yuan@kryptoncloud.com> wrote:
>>
>> yes, it is about 8k writes per node.
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:18 PM, daemeon reiydelle <daemeonr@gmail.com>
>> wrote:
>>
>> Are you saying 7k writes per node? or 30k writes per node?
>>
>>
>>
>>
>>
>>
>>
>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>>
>>
>> On Thu, Jul 7, 2016 at 2:05 PM, Yuan Fang <yuan@kryptoncloud.com> wrote:
>>
>> writes 30k/second is the main thing.
>>
>>
>>
>>
>>
>> On Thu, Jul 7, 2016 at 1:51 PM, daemeon reiydelle <daemeonr@gmail.com>
>> wrote:
>>
>> Assuming you meant 100k, that likely for something with 16mb of storage
>> (probably way small) where the data is more that 64k hence will not fit
>> into the row cache.
>>
>>
>>
>>
>>
>>
>>
>> *.......Daemeon C.M. ReiydelleUSA (+1) 415.501.0198
>> <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872
>> <%28%2B44%29%20%280%29%2020%208144%209872>*
>>
>>
>>
>> On Thu, Jul 7, 2016 at 1:25 PM, Yuan Fang <yuan@kryptoncloud.com> wrote:
>>
>>
>>
>> I have a cluster of 4 m4.xlarge nodes(4 cpus and 16 gb memory and 600GB
>> ssd EBS).
>>
>> I can reach a cluster wide write requests of 30k/second and read request
>> about 100/second. The cluster OS load constantly above 10. Are those normal?
>>
>>
>>
>> Thanks!
>>
>>
>>
>>
>>
>> Best,
>>
>>
>>
>> Yuan
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> ————————
>>
>> Ben Slater
>>
>> Chief Product Officer
>>
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>>
>> +61 437 929 798
>>
>>
>>
>>
>>
>>
>>
>>
>

Mime
View raw message