Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy includes SPF record at
 spf.trusted-forwarder.org)
Message-Id: <1400269568.17101.118281509.21488E2B@webmail.messagingengine.com>
From: Ariel Weisberg <ariel@weisberg.ws>
To: user@cassandra.apache.org
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Type: multipart/alternative; boundary="_----------=_1400269568171010";
 charset="utf-8"
Subject: Re: Best partition type for Cassandra with JBOD
Date: Fri, 16 May 2014 15:46:08 -0400
In-Reply-To: 
 <CAAZU44m36frNVVUZPReafmPc1P01PZi2oMQ2gKm4oBJD-oNiQQ@mail.gmail.com>
References: 
 <32d8851150cd4ed187c16c1cd6706356@DM2PR03MB318.namprd03.prod.outlook.com>
 <CAAZU44m36frNVVUZPReafmPc1P01PZi2oMQ2gKm4oBJD-oNiQQ@mail.gmail.com>

This is a multi-part message in MIME format.

--_----------=_1400269568171010
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Hi,


Recommending nobarrier (mount option barrier=3D0) when you don't know if
a non-volatile cache in play is probably not the way to go. A
non-volatile cache will typically ignore write barriers if a given
block device is configured to cache writes anyways.


I am also skeptical you will see a boost in performance. Applications
that want to defer and batch writes won't emit write barriers
frequently and when they do it's because the data has to be there.
Filesystems depend on write barriers although it is surprisingly hard
to get a reordering that is really bad because of the way journals are
managed.


Cassandra uses log structured storage and supports asynchronous
periodic group commit so it doesn't need to emit write barriers
frequently.


Setting read ahead to zero on an SSD is necessary to get the maximum
number of random reads, but will also disable prefetching for
sequential reads. You need a lot less prefetching with an SSD due to
the much faster response time, but it's still many microseconds.


Someone with more Cassandra specific knowledge can probably give better
advice as to when a non-zero read ahead make sense with Cassandra. This
is something may be workload specific as well.


Regards,

Ariel


On Fri, May 16, 2014, at 01:55 PM, Kevin Burton wrote:

That and nobarrier=E2=80=A6 and probably noop for the scheduler if using SSD
and setting readahead to zero...


On Fri, May 16, 2014 at 10:29 AM, James Campbell
<[1]james@breachintelligence.com> wrote:

Hi all=E2=80=94


What partition type is best/most commonly used for a multi-disk JBOD
setup running Cassandra on CentOS 64bit?


The datastax production server guidelines recommend XFS for data
partitions, saying, =E2=80=9CBecause Cassandra can use almost half your disk
space for a single file, use XFS when using large disks, particularly
if using a 32-bit kernel. XFS file size limits are 16TB max on a 32-bit
kernel, and essentially unlimited on 64-bit.=E2=80=9D


However, the same document also notes that =E2=80=9CMaximum recommended
capacity for Cassandra 1.2 and later is 3 to 5TB per node,=E2=80=9D which m=
akes
me think >16TB file sizes would be irrelevant (especially when not
using RAID to create a single large volume).  What has been the
experience of this group?


I also noted that the guidelines don=E2=80=99t mention setting noatime and
nodiratime flags in the fstab for data volumes, but I wonder if that=E2=80=
=99s
a common practice.

James


--

Founder/CEO [2]Spinn3r.com
Location: San Francisco, CA
Skype: burtonator
blog: [3]http://burtonator.wordpress.com
=E2=80=A6 or check out my [4]Google+ profile
[5][spinn3r.jpg]
War is peace. Freedom is slavery. Ignorance is strength. Corporations
are people.

References

1. mailto:james@breachintelligence.com
2. http://Spinn3r.com/
3. http://burtonator.wordpress.com/
4. https://plus.google.com/102718274791889610666/posts
5. http://spinn3r.com/

--_----------=_1400269568171010
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="utf-8"

<!DOCTYPE html>
<html>
<head>
<title></title>
</head>
<body><div>Hi,<br></div>
<div>&nbsp;</div>
<div>Recommending nobarrier (mount option barrier=3D0) when you don't know =
if a non-volatile cache in play is probably not the way to go. A non-volati=
le cache will typically ignore write barriers if a given block device is co=
nfigured to cache writes anyways.<br></div>
<div>&nbsp;</div>
<div>I am also skeptical you will see a boost in performance. Applications =
that want to defer and batch writes won't emit write barriers frequently an=
d when they do it's because the data has to be there. Filesystems depend on=
 write barriers although it is surprisingly hard to get a reordering that i=
s really bad because of the way journals are managed.<br></div>
<div>&nbsp;</div>
<div>Cassandra uses log structured storage and supports asynchronous period=
ic group commit so it doesn't need to emit write barriers frequently.</div>
<div>&nbsp;</div>
<div>Setting read ahead to zero on an SSD is necessary to get the maximum n=
umber of random reads, but will also disable prefetching for sequential rea=
ds. You need a lot less prefetching with an SSD due to the much faster resp=
onse time, but it's still many microseconds.<br></div>
<div>&nbsp;</div>
<div>Someone with more Cassandra specific knowledge can probably give bette=
r advice as to when a non-zero read ahead make sense with Cassandra. This i=
s something may be workload specific as well.<br></div>
<div>&nbsp;</div>
<div>Regards,<br></div>
<div>Ariel<br></div>
<div>&nbsp;</div>
<div>On Fri, May 16, 2014, at 01:55 PM, Kevin Burton wrote:<br></div>
<blockquote type=3D"cite"><div dir=3D"ltr">That and nobarrier=E2=80=A6 and =
probably noop for the scheduler if using SSD and setting readahead to zero.=
..<br></div>
<div><div>&nbsp;</div>
<div>&nbsp;</div>
<div><div>On Fri, May 16, 2014 at 10:29 AM, James Campbell <span dir=3D"ltr=
">&lt;<a href=3D"mailto:james@breachintelligence.com" target=3D"_blank">jam=
es@breachintelligence.com</a>&gt;</span> wrote:<br></div>
<blockquote style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex"><div lang=3D"EN-US"><div><p>Hi all=E2=80=94<u></u><u></u><br></p><=
p><u></u>&nbsp;<u></u><br></p><p>What partition type is best/most commonly =
used for a multi-disk JBOD setup running Cassandra on CentOS 64bit?<u></u><=
u></u><br></p><p><u></u>&nbsp;<u></u><br></p><p>The datastax production ser=
ver guidelines recommend XFS for data partitions, saying, =E2=80=9CBecause =
Cassandra can use almost half your disk space for a single file, use XFS wh=
en using large disks, particularly if using a 32-bit kernel. XFS file
 size limits are 16TB max on a 32-bit kernel, and essentially unlimited on =
64-bit.=E2=80=9D<u></u><u></u><br></p><p><u></u>&nbsp;<u></u><br></p><p>How=
ever, the same document also notes that =E2=80=9CMaximum recommended capaci=
ty for Cassandra 1.2 and later is 3 to 5TB per node,=E2=80=9D which makes m=
e think &gt;16TB file sizes would be irrelevant (especially when not using =
RAID to create a single large
 volume).&nbsp; What has been the experience of this group?<u></u><u></u><b=
r></p><p><u></u>&nbsp;<u></u><br></p><p>I also noted that the guidelines do=
n=E2=80=99t mention setting noatime and nodiratime flags in the fstab for d=
ata volumes, but I wonder if that=E2=80=99s a common practice.<span><span c=
lass=3D"colour" style=3D"color:rgb(136, 136, 136)"></span></span><br></p><d=
iv class=3D"" defang_dir=3D"">&nbsp;</div>
<div class=3D"" defang_dir=3D""><span><span class=3D"colour" style=3D"color=
:rgb(136, 136, 136)">
James<u></u><u></u></span></span><br></div>
</div>
</div>
</blockquote></div>
<div>&nbsp;</div>
<div>&nbsp;</div>
<div>&nbsp;</div>
<div>-- <br></div>
<div><div><p style=3D"margin-top:0px;margin-right:0px;margin-bottom:12pt;ma=
rgin-left:0px"><br></p><div>Founder/CEO&nbsp;<a href=3D"http://Spinn3r.com"=
 target=3D"_blank">Spinn3r.com</a><br></div>
<div>Location:&nbsp;<b>San Francisco, CA</b><br></div>
<div>Skype:&nbsp;<b>burtonator</b><br></div>
<div><span class=3D"colour" style=3D"color:rgb(44, 44, 44)"><span class=3D"=
font" style=3D"font-family:Helvetica, ' Arial', ' sans-serif'"><span style=
=3D"line-height:19px">blog:<b>&nbsp;</b></span></span></span><a href=3D"htt=
p://burtonator.wordpress.com" target=3D"_blank">http://burtonator.wordpress=
.com</a><br></div>
<div>=E2=80=A6 or check out my <a href=3D"https://plus.google.com/102718274=
791889610666/posts" target=3D"_blank">Google+ profile</a><br></div>
<div><a href=3D"http://spinn3r.com" target=3D"_blank"><img src=3D"http://sp=
inn3r.com/images/spinn3r.jpg"></a><br></div>
<div><span class=3D"highlight" style=3D"background-color: rgb(255, 255, 255=
)"><span class=3D"colour" style=3D"color:rgb(0, 0, 0)"><span class=3D"font"=
 style=3D"font-family:verdana, arial, helvetica, sans-serif"><span class=3D=
"size" style=3D"font-size:small">War is peace. Freedom is slavery. Ignoranc=
e is strength. Corporations are people.</span></span></span></span><br></di=
v>
<p><br></p></div>
</div>
</div>
</blockquote></body>
</html>

--_----------=_1400269568171010--