Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from
	:mime-version:content-type:subject:date:in-reply-to:to
	:references:message-id; q=dns; s=thelastpickle.com; b=hutfUzNZXK
	OzsRghfB2DWa2Fu+BU9tmqd2zrm/MmlbVMajkEAYx/dGE/zhMh+wXOMcEWcbqVY0
	wiH0l6bPOPHGhUPLnPJNhpuyThJJVIQUTfsvUvKzjWYjRYp8Va74SJ6dA19WbBuS
	im1euWXUWx2uXkhd6eZ6mV4BiAeN7//Kc=
From: aaron morton <aaron@thelastpickle.com>
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: multipart/alternative; boundary=Apple-Mail-16--133420478
Subject: Re: Write performance help needed
Date: Thu, 5 May 2011 22:28:13 +1200
In-Reply-To: <BANLkTikV3W7+JKrLP0GjjdATTCiovP2Nqw@mail.gmail.com>
To: user@cassandra.apache.org
References: <BANLkTinX2=wWH22MTig3d8AxxuEhROpiWQ@mail.gmail.com>
 <BANLkTinmtpG6613vKJVcDj2bJAjL8sTEgA@mail.gmail.com>
 <C23AD3BE-6AE4-4EDF-8DD8-5E7AC57EE140@thelastpickle.com>
 <BANLkTikV3W7+JKrLP0GjjdATTCiovP2Nqw@mail.gmail.com>
Message-Id: <FBD4AAB0-3AA3-4275-9941-74B70894D004@thelastpickle.com>


--Apple-Mail-16--133420478
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

I was inserting the contents of wikipedia, so the columns were at multi =
kilobyte strings. It's a good data source to run tests with as the =
records and relationships are somewhat varied in size.

My main point was to say the best way to benchmark cassandra with with =
multiple server nodes, multiple client threads /processes, the level of =
redundancy and consistency you want to run at in production, and if you =
can some sort of approximation of the data size. A single cassandra =
instance may well lose against  single RDBMS instance in a straight out =
race (thought as jonathan points out mongo is not playing fair). But you =
generally would not deploy a single cassandra node.

If you can provide some more details on your test we may be able to =
help:
- what is the target application
- the cassandra schema and any configuration changes
- the java code you used

Hope that helps.=20

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 5 May 2011, at 02:01, Steve Smith wrote:

> Since each row in my column family has 30 columns, wouldn't this =
translate to ~8,000 rows per second...or am I misunderstanding =
something.
>=20
> Talking in terms of columns, my load test would seem to perform as =
follows:
>=20
> 100,000 rows / 26 sec * 30 columns/row =3D 115K columns per second.
>=20
> That's on a dual core, 2.66 GHz laptop, 4GB RAM...single running =
cassandra node....hector (java) client.
>=20
> Am I interpreting things correctly?
>=20
> - Steve
>=20
>=20
> On Tue, May 3, 2011 at 3:59 PM, aaron morton <aaron@thelastpickle.com> =
wrote:
> To give an idea, last March (2010) I run the a much older Cassandra on =
10 HP blades (dual socket, 4 core, 16GB, 2.5 laptop HDD) and was writing =
around 250K columns per second with 500 python processes loading the =
data from wikipedia running on another 10 HP blades.
>=20
> This was my first out of the box no tuning (other then using sensible =
batch updates) test. Since then Cassandra has gotten much faster.
>=20
> Hope that helps
> Aaron
>=20
> On 4 May 2011, at 02:22, Jonathan Ellis wrote:
>=20
> > You don't give many details, but I would guess:
> >
> > - your benchmark is not multithreaded
> > - mongodb is not configured for durable writes, so you're really =
only
> > measuring the time for it to buffer it in memory
> > - you haven't loaded enough data to hit "mongo's index doesn't fit =
in
> > memory anymore"
> >
> > On Tue, May 3, 2011 at 8:24 AM, Steve Smith =
<stevenpsmith123@gmail.com> wrote:
> >> I am working for client that needs to persist 100K-200K records per =
second
> >> for later querying.  As a proof of concept, we are looking at =
several
> >> options including nosql (Cassandra and MongoDB).
> >> I have been running some tests on my laptop (MacBook Pro, 4GB RAM, =
2.66 GHz,
> >> Dual Core/4 logical cores) and have not been happy with the =
results.
> >> The best I have been able to accomplish is 100K records in =
approximately 30
> >> seconds.  Each record has 30 columns, mostly made up of integers.  =
I have
> >> tried both the Hector and Pelops APIs, and have tried writing in =
batches
> >> versus one at a time.  The times have not varied much.
> >> I am using the out of the box configuration for Cassandra, and =
while I know
> >> using 1 disk will have an impact on performance, I would expect to =
see
> >> better write numbers than I am.
> >> As a point of reference, the same test using MongoDB I was able to
> >> accomplish 100K records in 3.5 seconds.
> >> Any tips would be appreciated.
> >>
> >> - Steve
> >>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of DataStax, the source for professional Cassandra =
support
> > http://www.datastax.com
>=20
>=20


--Apple-Mail-16--133420478
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=us-ascii

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; =
"><div>I was inserting the contents of wikipedia, so the columns were at =
multi kilobyte strings. It's a good data source to run tests with as the =
records and relationships are somewhat varied in =
size.</div><div><br></div><div>My main point was to say the best way to =
benchmark cassandra with with multiple server nodes, multiple client =
threads /processes, the level of redundancy and consistency you want to =
run at in production, and if you can some sort of approximation of the =
data size. A single cassandra instance may well lose against =
&nbsp;single RDBMS instance in a straight out race (thought as jonathan =
points out mongo is not playing fair). But you generally would not =
deploy a single cassandra node.</div><div><br></div><div>If you can =
provide some more details on your test we may be able to =
help:</div><div>- what is the target application</div><div>- the =
cassandra schema and any configuration changes</div><div>- the java code =
you used</div><div><br></div>Hope that helps.&nbsp;<br><div><br =
class=3D"webkit-block-placeholder"></div><div>
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; =
text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: =
0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-style: normal; font-variant: normal; font-weight: =
normal; letter-spacing: normal; line-height: normal; orphans: 2; =
text-indent: 0px; text-transform: none; white-space: normal; widows: 2; =
word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; =
"><div><div>-----------------</div><div>Aaron Morton</div><div>Freelance =
Cassandra Developer</div><div>@aaronmorton</div><div><a =
href=3D"http://www.thelastpickle.com">http://www.thelastpickle.com</a></di=
v></div></div></span></div></span></span>
</div>

<br><div><div>On 5 May 2011, at 02:01, Steve Smith wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite">Since each =
row in my column family has 30 columns, wouldn't this translate to =
~8,000 rows per second...or am I misunderstanding =
something.<div><br></div><div>Talking in terms of columns, my load test =
would seem to perform as follows:</div>

<div><br></div><div>100,000 rows / 26 sec * 30 columns/row =3D 115K =
columns per second.</div><div><br></div><div>That's on a dual core, 2.66 =
GHz laptop, 4GB RAM...single running cassandra node....hector (java) =
client.</div>

<div><br></div><div>Am I interpreting things correctly?<br =
clear=3D"all"><br>- Steve<br>
<br><br><div class=3D"gmail_quote">On Tue, May 3, 2011 at 3:59 PM, aaron =
morton <span dir=3D"ltr">&lt;<a =
href=3D"mailto:aaron@thelastpickle.com">aaron@thelastpickle.com</a>&gt;</s=
pan> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex;">

To give an idea, last March (2010) I run the a much older Cassandra on =
10 HP blades (dual socket, 4 core, 16GB, 2.5 laptop HDD) and was writing =
around 250K columns per second with 500 python processes loading the =
data from wikipedia running on another 10 HP blades.<br>


<br>
This was my first out of the box no tuning (other then using sensible =
batch updates) test. Since then Cassandra has gotten much faster.<br>
<br>
Hope that helps<br>
<font color=3D"#888888">Aaron<br>
</font><div><div></div><div class=3D"h5"><br>
On 4 May 2011, at 02:22, Jonathan Ellis wrote:<br>
<br>
&gt; You don't give many details, but I would guess:<br>
&gt;<br>
&gt; - your benchmark is not multithreaded<br>
&gt; - mongodb is not configured for durable writes, so you're really =
only<br>
&gt; measuring the time for it to buffer it in memory<br>
&gt; - you haven't loaded enough data to hit "mongo's index doesn't fit =
in<br>
&gt; memory anymore"<br>
&gt;<br>
&gt; On Tue, May 3, 2011 at 8:24 AM, Steve Smith &lt;<a =
href=3D"mailto:stevenpsmith123@gmail.com">stevenpsmith123@gmail.com</a>&gt=
; wrote:<br>
&gt;&gt; I am working for client that needs to persist 100K-200K records =
per second<br>
&gt;&gt; for later querying. &nbsp;As a proof of concept, we are looking =
at several<br>
&gt;&gt; options including nosql (Cassandra and MongoDB).<br>
&gt;&gt; I have been running some tests on my laptop (MacBook Pro, 4GB =
RAM, 2.66 GHz,<br>
&gt;&gt; Dual Core/4 logical cores) and have not been happy with the =
results.<br>
&gt;&gt; The best I have been able to accomplish is 100K records in =
approximately 30<br>
&gt;&gt; seconds. &nbsp;Each record has 30 columns, mostly made up of =
integers. &nbsp;I have<br>
&gt;&gt; tried both the Hector and Pelops APIs, and have tried writing =
in batches<br>
&gt;&gt; versus one at a time. &nbsp;The times have not varied much.<br>
&gt;&gt; I am using the out of the box configuration for Cassandra, and =
while I know<br>
&gt;&gt; using 1 disk will have an impact on performance, I would expect =
to see<br>
&gt;&gt; better write numbers than I am.<br>
&gt;&gt; As a point of reference, the same test using MongoDB I was able =
to<br>
&gt;&gt; accomplish 100K records in 3.5 seconds.<br>
&gt;&gt; Any tips would be appreciated.<br>
&gt;&gt;<br>
&gt;&gt; - Steve<br>
&gt;&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; --<br>
&gt; Jonathan Ellis<br>
&gt; Project Chair, Apache Cassandra<br>
&gt; co-founder of DataStax, the source for professional Cassandra =
support<br>
&gt; <a href=3D"http://www.datastax.com/" =
target=3D"_blank">http://www.datastax.com</a><br>
<br>
</div></div></blockquote></div><br></div>
</blockquote></div><br></body></html>=

--Apple-Mail-16--133420478--