cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Boudreault (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-7386) JBOD threshold to prevent unbalanced disk utilization
Date Thu, 20 Nov 2014 12:22:35 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Alan Boudreault updated CASSANDRA-7386:
---------------------------------------
    Attachment: test_regression_with_patch.jpg
                test_regression_no_patch.jpg

Devs, this is the result of my regression test without and with the patch. 

Note: the compaction concurrency is set to 4 and the throughput unlimited.

h4. Test

* 12 disks total of 2G of size.
* Goal: run the following command to fill the disk:
cassandra-stress WRITE n=2000000 -col size=FIXED\(1000\) -mode native prepared cql3 -schema
keyspace=r1

h5. Result -  No Patch

!test_regression_no_patch.jpg|thumbnail! 

All disk are filled in ~420 seconds. Casandra-stress crashed with write timeouts at around
n=650000

h5. Result -  With Patch

!test_regression_with_patch.jpg|thumbnail!

Cassandra-stress finished all its work (~13 minutes, n=2000000) and all disks are under 60%
of disk usage.

Any idea what's going on? Am I doing something wrong in my test case?


> JBOD threshold to prevent unbalanced disk utilization
> -----------------------------------------------------
>
>                 Key: CASSANDRA-7386
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7386
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Chris Lohfink
>            Assignee: Robert Stupp
>            Priority: Minor
>             Fix For: 2.1.3
>
>         Attachments: 7386-2.0-v3.txt, 7386-2.0-v4.txt, 7386-2.0-v5.txt, 7386-2.1-v3.txt,
7386-2.1-v4.txt, 7386-2.1-v5.txt, 7386-v1.patch, 7386v2.diff, Mappe1.ods, mean-writevalue-7disks.png,
patch_2_1_branch_proto.diff, sstable-count-second-run.png, test1_no_patch.jpg, test1_with_patch.jpg,
test2_no_patch.jpg, test2_with_patch.jpg, test3_no_patch.jpg, test3_with_patch.jpg, test_regression_no_patch.jpg,
test_regression_with_patch.jpg
>
>
> Currently the pick the disks are picked first by number of current tasks, then by free
space.  This helps with performance but can lead to large differences in utilization in some
(unlikely but possible) scenarios.  Ive seen 55% to 10% and heard reports of 90% to 10% on
IRC.  With both LCS and STCS (although my suspicion is that STCS makes it worse since harder
to be balanced).
> I purpose the algorithm change a little to have some maximum range of utilization where
it will pick by free space over load (acknowledging it can be slower).  So if a disk A is
30% full and disk B is 5% full it will never pick A over B until it balances out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message