cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Davis <jerdavis.cassan...@gmail.com>
Subject Re: Using a separate commit log drive was 4x slower
Date Wed, 11 Aug 2010 22:06:37 GMT
Thanks for all the feedback, I'll be back in 2 weeks and pick up then.
-JD

On Tue, Aug 10, 2010 at 3:45 PM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > Yeah, it has a BBU, and it is charged and on..
> > Very odd behavior, I'm stumped.
>
> I advise double-checking raid volume settings and ensuring that policy
> is truly such that the write cache is used. This may also be a
> function of kernel driver settings depending on what RAID
> controller/kernel version you have (for example make sure that an
> fsync() doesn't result in asking the RAID controller to flush caches
> regardless of BBU state). In any case, these stats:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s
> avgrq-sz avgqu-sz   await  svctm  %util
> cciss/c0d0        0.00   908.50    0.00  110.50     0.00  8152.00
> 73.77     0.57    5.20   4.93  54.50
>
> ... are highly inconsistent with write-back caching under the
> assumption that the writes are indeed the sequential writes to the
> commit log. Write counts in the ~ 100/second ballpark, with no reads,
> average request size of 74, an average transaction time of 4.93 and a
> utilization of 54% *really really* sounds like the I/O is going all
> the way down to the platter. Either that or the RAID firmware/driver
> is not doing its job properly.
>
> I've attached a small script (note: UGLY hack since I just whipped it
> up, not a proper tool, but it does the job for this) that you can run
> to test it:
>
>   ./writelat.py /path/to/file_to_write_to # warning, file will be
> truncated/destroyed if it exists
>
> If you run this on an idle system and you're truly doing write-back
> caching, you should see numbers BELOW 1 (i.e., sub-millisecond times)
> (but you can ignore the first sample probably).
>
> It writes one 8kb chunk of data, fsync:s, writes another 8kb, fsyncs,
> etc. I predict you'll see numbers in the 3-7 millisecond range.
>
> Sample output for me with a plain old SATA drive (and ZFS on FreeBSD)
> follows.
>
> 33.8141918182
> 8.32605361938
> 8.44812393188
> 8.44788551331
> 8.40210914612
> 8.4490776062
> 8.38303565979
> 8.57901573181
> 8.20922851562
> 8.21614265442
> 10.0581645966
> 8.37683677673
> 8.50605964661
> 8.376121521
> 9.86790657043
> 8.43715667725
> 8.45789909363
> 8.40520858765
> 8.4171295166
> 8.46195220947
> 8.41498374939
> 8.46099853516
> 8.44287872314
> 8.43000411987
> 8.455991745
>
> --
> / Peter Schuller
>

Mime
View raw message