Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9D7AC2EB1 for ; Thu, 5 May 2011 10:28:51 +0000 (UTC) Received: (qmail 12927 invoked by uid 500); 5 May 2011 10:28:49 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 12905 invoked by uid 500); 5 May 2011 10:28:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 12897 invoked by uid 99); 5 May 2011 10:28:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2011 10:28:49 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a57.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 May 2011 10:28:42 +0000 Received: from homiemail-a57.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a57.g.dreamhost.com (Postfix) with ESMTP id F1BC320805B for ; Thu, 5 May 2011 03:28:18 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=hutfUzNZXK OzsRghfB2DWa2Fu+BU9tmqd2zrm/MmlbVMajkEAYx/dGE/zhMh+wXOMcEWcbqVY0 wiH0l6bPOPHGhUPLnPJNhpuyThJJVIQUTfsvUvKzjWYjRYp8Va74SJ6dA19WbBuS im1euWXUWx2uXkhd6eZ6mV4BiAeN7//Kc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=tf+IlO2JMcSvftr3 Ri5cvMgJNXQ=; b=AC1ohzzUrDPHbF0kyvzR7VT3F//aL2wbtnzzmrqEVvVMydgF RoYZgBiyXxfd3QCm5CJ6pHpuqiyq34dzFX9E07QBLruP2KJprsmdkbSr81mBkvEn +dzO7rL/vzBxsyV32iYEbEeKB2XrLJQKXge5wxwUlqmM2I7QrasokiPARF0= Received: from [10.0.1.151] (121-73-157-230.cable.telstraclear.net [121.73.157.230]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a57.g.dreamhost.com (Postfix) with ESMTPSA id DBFEF208058 for ; Thu, 5 May 2011 03:28:17 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1084) Content-Type: multipart/alternative; boundary=Apple-Mail-16--133420478 Subject: Re: Write performance help needed Date: Thu, 5 May 2011 22:28:13 +1200 In-Reply-To: To: user@cassandra.apache.org References: Message-Id: X-Mailer: Apple Mail (2.1084) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-16--133420478 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii I was inserting the contents of wikipedia, so the columns were at multi = kilobyte strings. It's a good data source to run tests with as the = records and relationships are somewhat varied in size. My main point was to say the best way to benchmark cassandra with with = multiple server nodes, multiple client threads /processes, the level of = redundancy and consistency you want to run at in production, and if you = can some sort of approximation of the data size. A single cassandra = instance may well lose against single RDBMS instance in a straight out = race (thought as jonathan points out mongo is not playing fair). But you = generally would not deploy a single cassandra node. If you can provide some more details on your test we may be able to = help: - what is the target application - the cassandra schema and any configuration changes - the java code you used Hope that helps.=20 ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 5 May 2011, at 02:01, Steve Smith wrote: > Since each row in my column family has 30 columns, wouldn't this = translate to ~8,000 rows per second...or am I misunderstanding = something. >=20 > Talking in terms of columns, my load test would seem to perform as = follows: >=20 > 100,000 rows / 26 sec * 30 columns/row =3D 115K columns per second. >=20 > That's on a dual core, 2.66 GHz laptop, 4GB RAM...single running = cassandra node....hector (java) client. >=20 > Am I interpreting things correctly? >=20 > - Steve >=20 >=20 > On Tue, May 3, 2011 at 3:59 PM, aaron morton = wrote: > To give an idea, last March (2010) I run the a much older Cassandra on = 10 HP blades (dual socket, 4 core, 16GB, 2.5 laptop HDD) and was writing = around 250K columns per second with 500 python processes loading the = data from wikipedia running on another 10 HP blades. >=20 > This was my first out of the box no tuning (other then using sensible = batch updates) test. Since then Cassandra has gotten much faster. >=20 > Hope that helps > Aaron >=20 > On 4 May 2011, at 02:22, Jonathan Ellis wrote: >=20 > > You don't give many details, but I would guess: > > > > - your benchmark is not multithreaded > > - mongodb is not configured for durable writes, so you're really = only > > measuring the time for it to buffer it in memory > > - you haven't loaded enough data to hit "mongo's index doesn't fit = in > > memory anymore" > > > > On Tue, May 3, 2011 at 8:24 AM, Steve Smith = wrote: > >> I am working for client that needs to persist 100K-200K records per = second > >> for later querying. As a proof of concept, we are looking at = several > >> options including nosql (Cassandra and MongoDB). > >> I have been running some tests on my laptop (MacBook Pro, 4GB RAM, = 2.66 GHz, > >> Dual Core/4 logical cores) and have not been happy with the = results. > >> The best I have been able to accomplish is 100K records in = approximately 30 > >> seconds. Each record has 30 columns, mostly made up of integers. = I have > >> tried both the Hector and Pelops APIs, and have tried writing in = batches > >> versus one at a time. The times have not varied much. > >> I am using the out of the box configuration for Cassandra, and = while I know > >> using 1 disk will have an impact on performance, I would expect to = see > >> better write numbers than I am. > >> As a point of reference, the same test using MongoDB I was able to > >> accomplish 100K records in 3.5 seconds. > >> Any tips would be appreciated. > >> > >> - Steve > >> > > > > > > > > -- > > Jonathan Ellis > > Project Chair, Apache Cassandra > > co-founder of DataStax, the source for professional Cassandra = support > > http://www.datastax.com >=20 >=20 --Apple-Mail-16--133420478 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
http://www.thelastpickle.com

On 5 May 2011, at 02:01, Steve Smith wrote:

Since each = row in my column family has 30 columns, wouldn't this translate to = ~8,000 rows per second...or am I misunderstanding = something.

Talking in terms of columns, my load test = would seem to perform as follows:

100,000 rows / 26 sec * 30 columns/row =3D 115K = columns per second.

That's on a dual core, 2.66 = GHz laptop, 4GB RAM...single running cassandra node....hector (java) = client.

Am I interpreting things correctly?

- Steve


On Tue, May 3, 2011 at 3:59 PM, aaron = morton <aaron@thelastpickle.com> wrote:
To give an idea, last March (2010) I run the a much older Cassandra on = 10 HP blades (dual socket, 4 core, 16GB, 2.5 laptop HDD) and was writing = around 250K columns per second with 500 python processes loading the = data from wikipedia running on another 10 HP blades.

This was my first out of the box no tuning (other then using sensible = batch updates) test. Since then Cassandra has gotten much faster.

Hope that helps
Aaron

On 4 May 2011, at 02:22, Jonathan Ellis wrote:

> You don't give many details, but I would guess:
>
> - your benchmark is not multithreaded
> - mongodb is not configured for durable writes, so you're really = only
> measuring the time for it to buffer it in memory
> - you haven't loaded enough data to hit "mongo's index doesn't fit = in
> memory anymore"
>
> On Tue, May 3, 2011 at 8:24 AM, Steve Smith <stevenpsmith123@gmail.com>= ; wrote:
>> I am working for client that needs to persist 100K-200K records = per second
>> for later querying.  As a proof of concept, we are looking = at several
>> options including nosql (Cassandra and MongoDB).
>> I have been running some tests on my laptop (MacBook Pro, 4GB = RAM, 2.66 GHz,
>> Dual Core/4 logical cores) and have not been happy with the = results.
>> The best I have been able to accomplish is 100K records in = approximately 30
>> seconds.  Each record has 30 columns, mostly made up of = integers.  I have
>> tried both the Hector and Pelops APIs, and have tried writing = in batches
>> versus one at a time.  The times have not varied much.
>> I am using the out of the box configuration for Cassandra, and = while I know
>> using 1 disk will have an impact on performance, I would expect = to see
>> better write numbers than I am.
>> As a point of reference, the same test using MongoDB I was able = to
>> accomplish 100K records in 3.5 seconds.
>> Any tips would be appreciated.
>>
>> - Steve
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra = support
> http://www.datastax.com



= --Apple-Mail-16--133420478--