Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CANNkHXZ1btNRNYursB08y8gP=62tnb3dagFKqdvP75Wqy4Xarw@mail.gmail.com>
References: <638859F2-AF96-4533-8F5D-BD524056FADD@venarc.com>
	<CADVHTB-BaJP=_SLdvaC7P57Sa6d89ukAe0miO15UU7F-ZDEFEA@mail.gmail.com>
	<CADJL=w6SHs4eEQFkEGqmNBFN_mxF2NuB9uCfbirdD==+GgofMw@mail.gmail.com>
	<EBF114A5-0F62-44BD-B88C-C4EB9F7D8819@venarc.com>
	<CADJL=w6iq908rLKcBeTeOtGjdcH7GUE2tTCepM9EokGFD_B3Lg@mail.gmail.com>
	<CANNkHXZ1btNRNYursB08y8gP=62tnb3dagFKqdvP75Wqy4Xarw@mail.gmail.com>
Date: Sat, 17 Mar 2012 13:48:27 +0100
Message-ID: 
 <CADVHTB-7t4PuVmdLVJF5=aYEUiXr5-5cBArvvmMV2adVbRrQ=g@mail.gmail.com>
Subject: Re: Single Node Cassandra Installation
From: "R. Verlangen" <robin@us2.nl>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=14dae9cdc0f9d38e1c04bb6fbdc3

--14dae9cdc0f9d38e1c04bb6fbdc3
Content-Type: text/plain; charset=ISO-8859-1

" By default Cassandra tries to write to both nodes, always. Writes will
only fail (on a node) if it is down, and even then hinted handoff will
attempt to keep both nodes in sync when the troubled node comes back up.
The point of having two nodes is to have read and write availability in the
face of transient failure. "

Even more: if you enable read repair the chances of having bad writes
decreases for any further reads. This will make your cluster become faster
consistent again after some failure.

Also consider to use different CL's for different operations. E.g. the
Twitter timeline can miss some records, however if you would want to
display my bank account I would prefer to see the right thing: or a nice
error message.

2012/3/16 Ben Coverston <ben.coverston@datastax.com>

> Doing reads and writes at CL=1 with RF=2 N=2 does not imply that the reads
> will be inconsistent. It's more complicated than the simple counting of
> blocked replicas. It is easy to support the notion that it will be largely
> consistent, in fact very consistent for most use cases.
>
> By default Cassandra tries to write to both nodes, always. Writes will
> only fail (on a node) if it is down, and even then hinted handoff will
> attempt to keep both nodes in sync when the troubled node comes back up.
> The point of having two nodes is to have read and write availability in the
> face of transient failure.
>
> If you are interested there is a good exposition of what 'consistency'
> means in a system like Cassandra from the link below[1].
>
> [1]
> http://www.eecs.berkeley.edu/~pbailis/projects/pbs/
>
>
> On Fri, Mar 16, 2012 at 6:50 AM, Thomas van Neerijnen <
> tom@bossastudios.com> wrote:
>
>> You'll need to either read or write at at least quorum to get consistent
>> data from the cluster so you may as well do both.
>> Now that you mention it, I was wrong about downtime, with a two node
>> cluster reads or writes at quorum will mean both nodes need to be online.
>> Perhaps you could have an emergency switch in your application which flips
>> to consistency of 1 if one of your Cassandra servers goes down? Just make
>> sure it's set back to quorum when the second one returns or again you could
>> end up with inconsistent data.
>>
>>
>> On Fri, Mar 16, 2012 at 2:04 AM, Drew Kutcharian <drew@venarc.com> wrote:
>>
>>> Thanks for the comments, I guess I will end up doing a 2 node cluster
>>> with replica count 2 and read consistency 1.
>>>
>>> -- Drew
>>>
>>>
>>>
>>> On Mar 15, 2012, at 4:20 PM, Thomas van Neerijnen wrote:
>>>
>>> So long as data loss and downtime are acceptable risks a one node
>>> cluster is fine.
>>> Personally this is usually only acceptable on my workstation, even my
>>> dev environment is redundant, because servers fail, usually when you least
>>> want them to, like for example when you've decided to save costs by waiting
>>> before implementing redundancy. Could a failure end up costing you more
>>> than you've saved? I'd rather get cheaper servers (maybe even used off
>>> ebay??) so I could have at least two of them.
>>>
>>> If you do go with a one node solution, altho I haven't tried it myself
>>> Priam looks like a good place to start for backups, otherwise roll your own
>>> with incremental snapshotting turned on and a watch on the snapshot
>>> directory. Storage on something like S3 or Cloud Files is very cheap so
>>> there's no good excuse for no backups.
>>>
>>> On Thu, Mar 15, 2012 at 7:12 PM, R. Verlangen <robin@us2.nl> wrote:
>>>
>>>> Hi Drew,
>>>>
>>>> One other disadvantage is the lack of "consistency level" and
>>>> "replication". Both ware part of the high availability / redundancy. So you
>>>> would really need to backup your single-node-"cluster" to some other
>>>> external location.
>>>>
>>>> Good luck!
>>>>
>>>>
>>>> 2012/3/15 Drew Kutcharian <drew@venarc.com>
>>>>
>>>>> Hi,
>>>>>
>>>>> We are working on a project that initially is going to have very
>>>>> little data, but we would like to use Cassandra to ease the future
>>>>> scalability. Due to budget constraints, we were thinking to run a single
>>>>> node Cassandra for now and then add more nodes as required.
>>>>>
>>>>> I was wondering if it is recommended to run a single node cassandra in
>>>>> production? Are there any other issues besides lack of high availability?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Drew
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
> --
> Ben Coverston
> DataStax -- The Apache Cassandra Company
>
>

--14dae9cdc0f9d38e1c04bb6fbdc3
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

&quot;
<span style>By default Cassandra tries to write to both nodes, always. Writ=
es will only fail (on a node) if it is down, and even then hinted handoff w=
ill attempt to keep both nodes in sync when the troubled node comes back up=
. The point of having two nodes is to have read and write availability in t=
he face of transient failure.</span>=A0&quot;<div>
<br></div><div>Even more: if you enable read repair the chances of having b=
ad writes decreases for any further reads. This will make your cluster beco=
me faster consistent again after some failure.</div><div><br></div><div>
Also consider to use different CL&#39;s for different operations. E.g. the =
Twitter timeline can miss some records, however if you would want to displa=
y my bank account I would prefer to see the right thing: or a nice error me=
ssage.=A0</div>
<div><br><div class=3D"gmail_quote">2012/3/16 Ben Coverston <span dir=3D"lt=
r">&lt;<a href=3D"mailto:ben.coverston@datastax.com">ben.coverston@datastax=
.com</a>&gt;</span><br><blockquote class=3D"gmail_quote" style=3D"margin:0 =
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>Doing reads and writes at CL=3D1 with RF=3D2 N=3D2 does not imply that=
 the reads will be inconsistent. It&#39;s more complicated than the simple =
counting of blocked replicas. It is easy to support the notion that it will=
 be largely consistent, in fact very consistent for most use cases.<br>


</div><div><br></div><div>By default Cassandra tries to write to both nodes=
, always. Writes will only fail (on a node) if it is down, and even then hi=
nted handoff will attempt to keep both nodes in sync when the troubled node=
 comes back up. The point of having two nodes is to have read and write ava=
ilability in the face of transient failure.</div>


<div><br></div><div>If you are interested there is a good exposition of wha=
t &#39;consistency&#39; means in a system like Cassandra from the link belo=
w[1].</div><div><br></div><div>[1]</div><div><a href=3D"http://www.eecs.ber=
keley.edu/~pbailis/projects/pbs/" target=3D"_blank">http://www.eecs.berkele=
y.edu/~pbailis/projects/pbs/</a><br>


</div><div><br></div><div><div><div><div class=3D"h5"><br><div class=3D"gma=
il_quote">On Fri, Mar 16, 2012 at 6:50 AM, Thomas van Neerijnen <span dir=
=3D"ltr">&lt;<a href=3D"mailto:tom@bossastudios.com" target=3D"_blank">tom@=
bossastudios.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">

You&#39;ll need to either read or write at at least quorum to get consisten=
t data from the cluster so you may as well do both.<br>Now that you mention=
 it, I was wrong about downtime, with a two node cluster reads or writes at=
 quorum will mean both nodes need to be online. Perhaps you could have an e=
mergency switch in your application which flips to consistency of 1 if one =
of your Cassandra servers goes down? Just make sure it&#39;s set back to qu=
orum when the second one returns or again you could end up with inconsisten=
t data.<div>


<div><br>
<br><div class=3D"gmail_quote">On Fri, Mar 16, 2012 at 2:04 AM, Drew Kutcha=
rian <span dir=3D"ltr">&lt;<a href=3D"mailto:drew@venarc.com" target=3D"_bl=
ank">drew@venarc.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_qu=
ote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex=
">


<div style=3D"word-wrap:break-word">Thanks for the comments, I guess I will=
 end up doing a 2 node cluster with replica count 2 and read consistency 1.=
<span><font color=3D"#888888"><div><br></div></font></span><div>
<span><font color=3D"#888888">-- Drew</font></span><div><div><br><div><br><=
div><br><div><div>On Mar 15, 2012, at 4:20 PM, Thomas van Neerijnen wrote:<=
/div><br><blockquote type=3D"cite">So long as data loss and downtime are ac=
ceptable risks a one node cluster is fine.<br>


Personally this is usually only acceptable on my workstation, even my dev e=
nvironment is redundant, because servers fail, usually when you least want =
them to, like for example when you&#39;ve decided to save costs by waiting =
before implementing redundancy. Could a failure end up costing you more tha=
n you&#39;ve saved? I&#39;d rather get cheaper servers (maybe even used off=
 ebay??) so I could have at least two of them.<br>


<br>If you do go with a one node solution, altho I haven&#39;t tried it mys=
elf Priam looks like a good place to start for backups, otherwise roll your=
 own with incremental snapshotting turned on and a watch on the snapshot di=
rectory. Storage on something like S3 or Cloud Files is very cheap so there=
&#39;s no good excuse for no backups.<br>


<br><div class=3D"gmail_quote">On Thu, Mar 15, 2012 at 7:12 PM, R. Verlange=
n <span dir=3D"ltr">&lt;<a href=3D"mailto:robin@us2.nl" target=3D"_blank">r=
obin@us2.nl</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Hi Drew,<div><br></div><div>One other disadvantage is the lack of &quot;con=
sistency level&quot; and &quot;replication&quot;. Both ware part of the hig=
h availability / redundancy. So you would really need to backup your single=
-node-&quot;cluster&quot; to some other external location.</div>


<div><br></div><div>Good luck!<div><div><br><br><div class=3D"gmail_quote">=
2012/3/15 Drew Kutcharian <span dir=3D"ltr">&lt;<a href=3D"mailto:drew@vena=
rc.com" target=3D"_blank">drew@venarc.com</a>&gt;</span><br><blockquote cla=
ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa=
dding-left:1ex">


Hi,<br>
<br>
We are working on a project that initially is going to have very little dat=
a, but we would like to use Cassandra to ease the future scalability. Due t=
o budget constraints, we were thinking to run a single node Cassandra for n=
ow and then add more nodes as required.<br>


<br>
I was wondering if it is recommended to run a single node cassandra in prod=
uction? Are there any other issues besides lack of high availability?<br>
<br>
Thanks,<br>
<br>
Drew<br>
<br>
</blockquote></div><br></div></div></div>
</blockquote></div><br>
</blockquote></div><br></div></div></div></div></div></div></blockquote></d=
iv><br>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div></div><=
/div><span class=3D"HOEnZb"><font color=3D"#888888">-- <br>Ben Coverston<di=
v>DataStax -- The Apache Cassandra Company</div><br>
</font></span></div></div>
</blockquote></div><br></div>

--14dae9cdc0f9d38e1c04bb6fbdc3--