Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: unknown (nike.apache.org: error in processing during lookup of
 bryan.talbot@playnext.com)
MIME-Version: 1.0
In-Reply-To: 
 <bb90cf75a59f4b2593823894a92a7861@BLUPR03MB310.namprd03.prod.outlook.com>
References: 
 <32d8851150cd4ed187c16c1cd6706356@DM2PR03MB318.namprd03.prod.outlook.com>
	<CAAZU44m36frNVVUZPReafmPc1P01PZi2oMQ2gKm4oBJD-oNiQQ@mail.gmail.com>
	<1400269568.17101.118281509.21488E2B@webmail.messagingengine.com>
	<bb90cf75a59f4b2593823894a92a7861@BLUPR03MB310.namprd03.prod.outlook.com>
Date: Mon, 19 May 2014 11:53:51 -0700
Message-ID: 
 <CALksHnMNOGkZps+ndZ4Ndw1+hzZhYmw_awKQfMMEN3p9mgUaqg@mail.gmail.com>
Subject: Re: Best partition type for Cassandra with JBOD
From: Bryan Talbot <bryan.talbot@playnext.com>
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Content-Type: multipart/alternative; boundary=90e6ba25ea3bc0e71004f9c548ad

--90e6ba25ea3bc0e71004f9c548ad
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

For XFS, using noatime and nodirtime isn't really useful either.

http://xfs.org/index.php/XFS_FAQ#Q:_Is_using_noatime_or.2Fand_nodiratime_at=
_mount_time_giving_any_performance_benefits_in_xfs_.28or_not_using_them_per=
formance_decrease.29.3F


On Sat, May 17, 2014 at 7:52 AM, James Campbell <
james@breachintelligence.com> wrote:

>  Thanks for the thoughts!
> On May 16, 2014 4:23 PM, Ariel Weisberg <ariel@weisberg.ws> wrote:
>  Hi,
>
> Recommending nobarrier (mount option barrier=3D0) when you don't know if =
a
> non-volatile cache in play is probably not the way to go. A non-volatile
> cache will typically ignore write barriers if a given block device is
> configured to cache writes anyways.
>
> I am also skeptical you will see a boost in performance. Applications tha=
t
> want to defer and batch writes won't emit write barriers frequently and
> when they do it's because the data has to be there. Filesystems depend on
> write barriers although it is surprisingly hard to get a reordering that =
is
> really bad because of the way journals are managed.
>
> Cassandra uses log structured storage and supports asynchronous periodic
> group commit so it doesn't need to emit write barriers frequently.
>
> Setting read ahead to zero on an SSD is necessary to get the maximum
> number of random reads, but will also disable prefetching for sequential
> reads. You need a lot less prefetching with an SSD due to the much faster
> response time, but it's still many microseconds.
>
> Someone with more Cassandra specific knowledge can probably give better
> advice as to when a non-zero read ahead make sense with Cassandra. This i=
s
> something may be workload specific as well.
>
> Regards,
>  Ariel
>
> On Fri, May 16, 2014, at 01:55 PM, Kevin Burton wrote:
>
> That and nobarrier=E2=80=A6 and probably noop for the scheduler if using =
SSD and
> setting readahead to zero...
>
>
>  On Fri, May 16, 2014 at 10:29 AM, James Campbell <
> james@breachintelligence.com> wrote:
>
>  Hi all=E2=80=94
>
>
>
> What partition type is best/most commonly used for a multi-disk JBOD setu=
p
> running Cassandra on CentOS 64bit?
>
>
>
> The datastax production server guidelines recommend XFS for data
> partitions, saying, =E2=80=9CBecause Cassandra can use almost half your d=
isk space
> for a single file, use XFS when using large disks, particularly if using =
a
> 32-bit kernel. XFS file size limits are 16TB max on a 32-bit kernel, and
> essentially unlimited on 64-bit.=E2=80=9D
>
>
>
> However, the same document also notes that =E2=80=9CMaximum recommended c=
apacity
> for Cassandra 1.2 and later is 3 to 5TB per node,=E2=80=9D which makes me=
 think
> >16TB file sizes would be irrelevant (especially when not using RAID to
> create a single large volume).  What has been the experience of this grou=
p?
>
>
>
> I also noted that the guidelines don=E2=80=99t mention setting noatime an=
d
> nodiratime flags in the fstab for data volumes, but I wonder if that=E2=
=80=99s a
> common practice.
>
> James
>
>
>
>
> --
>
>
>  Founder/CEO Spinn3r.com
>  Location: *San Francisco, CA*
>  Skype: *burtonator*
>  blog: http://burtonator.wordpress.com
>  =E2=80=A6 or check out my Google+ profile<https://plus.google.com/102718=
274791889610666/posts>
>  <http://spinn3r.com>
>  War is peace. Freedom is slavery. Ignorance is strength. Corporations
> are people.
>
>
>


--=20
Bryan Talbot
Architect / Platform team lead, Aeria Games and Entertainment
Silicon Valley | Berlin | Tokyo | Sao Paulo

--90e6ba25ea3bc0e71004f9c548ad
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">For XFS, using noatime and nodirtime isn&#39;t really usef=
ul either.<div><br></div><div><a href=3D"http://xfs.org/index.php/XFS_FAQ#Q=
:_Is_using_noatime_or.2Fand_nodiratime_at_mount_time_giving_any_performance=
_benefits_in_xfs_.28or_not_using_them_performance_decrease.29.3F">http://xf=
s.org/index.php/XFS_FAQ#Q:_Is_using_noatime_or.2Fand_nodiratime_at_mount_ti=
me_giving_any_performance_benefits_in_xfs_.28or_not_using_them_performance_=
decrease.29.3F</a><br>
</div><div><br></div><div><br></div></div><div class=3D"gmail_extra"><br><b=
r><div class=3D"gmail_quote">On Sat, May 17, 2014 at 7:52 AM, James Campbel=
l <span dir=3D"ltr">&lt;<a href=3D"mailto:james@breachintelligence.com" tar=
get=3D"_blank">james@breachintelligence.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">


<div>
<p dir=3D"ltr">Thanks for the thoughts!</p><div><div class=3D"h5">
<div>On May 16, 2014 4:23 PM, Ariel Weisberg &lt;<a href=3D"mailto:ariel@we=
isberg.ws" target=3D"_blank">ariel@weisberg.ws</a>&gt; wrote:<br type=3D"at=
tribution">
</div>
<div>
<div>Hi,<br>
</div>
<div>=C2=A0</div>
<div>Recommending nobarrier (mount option barrier=3D0) when you don&#39;t k=
now if a non-volatile cache in play is probably not the way to go. A non-vo=
latile cache will typically ignore write barriers if a given block device i=
s configured to cache writes anyways.<br>

</div>
<div>=C2=A0</div>
<div>I am also skeptical you will see a boost in performance. Applications =
that want to defer and batch writes won&#39;t emit write barriers frequentl=
y and when they do it&#39;s because the data has to be there. Filesystems d=
epend on write barriers although it is surprisingly
 hard to get a reordering that is really bad because of the way journals ar=
e managed.<br>
</div>
<div>=C2=A0</div>
<div>Cassandra uses log structured storage and supports asynchronous period=
ic group commit so it doesn&#39;t need to emit write barriers frequently.</=
div>
<div>=C2=A0</div>
<div>Setting read ahead to zero on an SSD is necessary to get the maximum n=
umber of random reads, but will also disable prefetching for sequential rea=
ds. You need a lot less prefetching with an SSD due to the much faster resp=
onse time, but it&#39;s still many microseconds.<br>

</div>
<div>=C2=A0</div>
<div>Someone with more Cassandra specific knowledge can probably give bette=
r advice as to when a non-zero read ahead make sense with Cassandra. This i=
s something may be workload specific as well.<br>
</div>
<div>=C2=A0</div>
<div>Regards,<br>
</div>
<div>Ariel<br>
</div>
<div>=C2=A0</div>
<div>On Fri, May 16, 2014, at 01:55 PM, Kevin Burton wrote:<br>
</div>
<blockquote type=3D"cite">
<div dir=3D"ltr">That and nobarrier=E2=80=A6 and probably noop for the sche=
duler if using SSD and setting readahead to zero...<br>
</div>
<div>
<div>=C2=A0</div>
<div>=C2=A0</div>
<div>
<div>On Fri, May 16, 2014 at 10:29 AM, James Campbell <span dir=3D"ltr">&lt=
;<a href=3D"mailto:james@breachintelligence.com" target=3D"_blank">james@br=
eachintelligence.com</a>&gt;</span> wrote:<br>
</div>
<blockquote style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-l=
eft:1ex">
<div lang=3D"EN-US">
<div>
<p>Hi all=E2=80=94<u></u><u></u><br>
</p>
<p><u></u>=C2=A0<u></u><br>
</p>
<p>What partition type is best/most commonly used for a multi-disk JBOD set=
up running Cassandra on CentOS 64bit?<u></u><u></u><br>
</p>
<p><u></u>=C2=A0<u></u><br>
</p>
<p>The datastax production server guidelines recommend XFS for data partiti=
ons, saying, =E2=80=9CBecause Cassandra can use almost half your disk space=
 for a single file, use XFS when using large disks, particularly if using a=
 32-bit kernel. XFS file size limits are
 16TB max on a 32-bit kernel, and essentially unlimited on 64-bit.=E2=80=9D=
<u></u><u></u><br>
</p>
<p><u></u>=C2=A0<u></u><br>
</p>
<p>However, the same document also notes that =E2=80=9CMaximum recommended =
capacity for Cassandra 1.2 and later is 3 to 5TB per node,=E2=80=9D which m=
akes me think &gt;16TB file sizes would be irrelevant (especially when not =
using RAID to create a single large volume).=C2=A0 What
 has been the experience of this group?<u></u><u></u><br>
</p>
<p><u></u>=C2=A0<u></u><br>
</p>
<p>I also noted that the guidelines don=E2=80=99t mention setting noatime a=
nd nodiratime flags in the fstab for data volumes, but I wonder if that=E2=
=80=99s a common practice.<span><span style=3D"color:rgb(136,136,136)"></sp=
an></span><br>

</p>
<div>=C2=A0</div>
<div><span><span style=3D"color:rgb(136,136,136)">James<u></u><u></u></span=
></span><br>
</div>
</div>
</div>
</blockquote>
</div>
<div>=C2=A0</div>
<div>=C2=A0</div>
<div>=C2=A0</div>
<div>-- <br>
</div>
<div>
<div>
<p style=3D"margin-top:0px;margin-right:0px;margin-bottom:12pt;margin-left:=
0px">
<br>
</p>
<div>Founder/CEO=C2=A0<a href=3D"http://Spinn3r.com" target=3D"_blank">Spin=
n3r.com</a><br>
</div>
<div>Location:=C2=A0<b>San Francisco, CA</b><br>
</div>
<div>Skype:=C2=A0<b>burtonator</b><br>
</div>
<div><span style=3D"color:rgb(44,44,44)"><span style=3D"font-family:Helveti=
ca,&#39; Arial&#39;,&#39; sans-serif&#39;"><span style=3D"line-height:19px"=
>blog:<b>=C2=A0</b></span></span></span><a href=3D"http://burtonator.wordpr=
ess.com" target=3D"_blank">http://burtonator.wordpress.com</a><br>

</div>
<div>=E2=80=A6 or check out my <a href=3D"https://plus.google.com/102718274=
791889610666/posts" target=3D"_blank">
Google+ profile</a><br>
</div>
<div><a href=3D"http://spinn3r.com" target=3D"_blank"><img src=3D"http://sp=
inn3r.com/images/spinn3r.jpg"></a><br>
</div>
<div><span style=3D"background-color:rgb(255,255,255)"><span style=3D"color=
:rgb(0,0,0)"><span style=3D"font-family:verdana,arial,helvetica,sans-serif"=
><span style=3D"font-size:small">War is peace. Freedom is
 slavery. Ignorance is strength. Corporations are people.</span></span></sp=
an></span><br>
</div>
<p><br>
</p>
</div>
</div>
</div>
</blockquote>
</div>
</div></div></div>

</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>Bryan Talbot=
<div>Architect / Platform team lead, Aeria Games and Entertainment</div><di=
v>Silicon Valley | Berlin | Tokyo | Sao Paulo</div><div><br></div>
</div>

--90e6ba25ea3bc0e71004f9c548ad--