Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 606E399A2 for ; Sat, 11 Aug 2012 20:32:42 +0000 (UTC) Received: (qmail 48428 invoked by uid 500); 11 Aug 2012 20:32:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 48378 invoked by uid 500); 11 Aug 2012 20:32:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 48368 invoked by uid 99); 11 Aug 2012 20:32:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Aug 2012 20:32:39 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tyler@datastax.com designates 209.85.220.172 as permitted sender) Received: from [209.85.220.172] (HELO mail-vc0-f172.google.com) (209.85.220.172) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Aug 2012 20:32:34 +0000 Received: by vcbfo14 with SMTP id fo14so2777606vcb.31 for ; Sat, 11 Aug 2012 13:32:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:x-gm-message-state; bh=S0tDVrvWlWrMSWpVZKjqSx2So7OecUIj0A9A0kdrobU=; b=RqGUjjHAYNo6qPetO+A305qVJj0fg5da/GCiHCKWzrNdKyJTa/I9/OOlBy8d/Sj0Xf b/J+NzoYej8K5lKVUCu8ws+YKsih1KXeDxqGLIeh2Hf+U1x3+E83prereFZdbEI5kQ+p 0d9NrxhkxOlXXH0EJvWPtaNRgUYWtQSovl40DW56UMJYCxo2Kb6jqZ1ViLBRGCTstBuL sVJV5MFtBD4W/qCIz6hHcKTeCduH5902eJsC65AHmLvxX4oRIAacIjQFfZSxCUbyY5Vn gHdg/oWlDF3tgyAMyFuwxsGtsp40Pa3CcR+EK4P5LuGnW+fFmviwrvjuh7NX66wGDkAO FSAw== MIME-Version: 1.0 Received: by 10.52.35.15 with SMTP id d15mr4812105vdj.128.1344717133150; Sat, 11 Aug 2012 13:32:13 -0700 (PDT) Received: by 10.58.172.72 with HTTP; Sat, 11 Aug 2012 13:32:13 -0700 (PDT) In-Reply-To: References: Date: Sat, 11 Aug 2012 15:32:13 -0500 Message-ID: Subject: Re: anyone have any performance numbers? and here are some perf numbers of my own... From: Tyler Hobbs To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf307ac79b0b7a9e04c7035b38 X-Gm-Message-State: ALoCoQkV1uzdehBuhpVdon3cbN815DF+CSgzvWDcwm6CylPnbhkDxRlITEo5594pkm/scs2+Afo8 --20cf307ac79b0b7a9e04c7035b38 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable One node can typically handle 30k+ inserts per second, so you should be able to insert the 9 million rows in about 5 minutes with a single node cluster. My guess is that you're inserting with a single thread, which means you're bound by network latency. Try using 100 threads, or better, just use the stress tool that comes with Cassandra: http://www.datastax.com/docs/1.0/references/stress_java On Fri, Aug 10, 2012 at 5:02 PM, Hiller, Dean wrote: > Ignore the third one, my math was bad=C5=A0worked out to 733 bytes / row = and it > ended up being 6.6 gig as it compacted it some after it was done when the > load was light(noticed that a bit later) > > But what about the other two? Is that the time is expected approximately= ? > > Thanks, > Dean > > On 8/10/12 3:50 PM, "Hiller, Dean" wrote: > > >****** 3. In my test below, I see there is now 8Gig of data and 9,000,00= 0 > >rows. Does that sound right?, nearly 1MB of space is used per row for = a > >50 column row???? That sounds like a huge amount of overhead. (my value= s > >are long on every column, but that is still not much). I was expecting > >KB / row maybe, but MB / row? My column names are "col"+I as well so > >they are very short too. > > > >A common configuration is 1T drives per node, so I was wondering if > >anyone ran any tests with map/reduce on reading in all those rows(not > >doing anything with it, just reading it in). > > > >****** 1. How long does it take to go through the 500MB that would be on > >that node? > > > >I ran some tests on just writing a fake table in 50 columns wide and am > >seeing it will take about 31 hours to write 500MB of information (a node > >is about full at 500MB since need to reserve 50-30% space for compaction > >and such). Ie. If I need to rerun any kind of indexing, it will take 31 > >hours=C5=A0does this sound about normal/ballpark? Obviously many nodes = will > >be below so that would be worst case with 1 T drives. > > > >****** 2. Anyone have any other data? > > > >Thanks, > >Dean > > --=20 Tyler Hobbs DataStax --20cf307ac79b0b7a9e04c7035b38 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable One node can typically handle 30k+ inserts per second, so you should be abl= e to insert the 9 million rows in about 5 minutes with a single node cluste= r.=C2=A0 My guess is that you're inserting with a single thread, which = means you're bound by network latency.=C2=A0 Try using 100 threads, or = better, just use the stress tool that comes with Cassandra: http://www.datastax.co= m/docs/1.0/references/stress_java

On Fri, Aug 10, 2012 at 5:02 PM, Hiller, Dea= n <Dean.Hiller@nrel.gov> wrote:
Ignore the third one, my math was bad=C5=A0worked out to 733 bytes / row an= d it
ended up being 6.6 gig as it compacted it some after it was done when the load was light(noticed that a bit later)

But what about the other two? =C2=A0Is that the time is expected approximat= ely?

Thanks,
Dean

On 8/10/12 3:50 PM, "Hiller, Dean" <Dean.Hiller@nrel.gov> wrote:

>****** 3. In my test below, I see there is now 8Gig of data and 9,000,0= 00
>rows. =C2=A0Does that sound right?, =C2=A0nearly 1MB of space is used p= er row for a
>50 column row???? =C2=A0That sounds like a huge amount of overhead. (my= values
>are long on every column, but that is still not much). =C2=A0I was expe= cting
>KB / row maybe, but MB / row? =C2=A0My column names are "col"= +I as well so
>they are very short too.
>
>A common configuration is 1T drives per node, so I was wondering if
>anyone ran any tests with map/reduce on reading in all those rows(not >doing anything with it, just reading it in).
>
>****** 1. How long does it take to go through the 500MB that would be o= n
>that node?
>
>I ran some tests on just writing a fake table in 50 columns wide and am=
>seeing it will take about 31 hours to write 500MB of information (a nod= e
>is about full at 500MB since need to reserve 50-30% space for compactio= n
>and such). =C2=A0Ie. If I need to rerun any kind of indexing, it will t= ake 31
>hours=C5=A0does this sound about normal/ballpark? =C2=A0Obviously= many nodes will
>be below so that would be worst= case with 1 T drives.
>
>****** 2. Anyone have any other data?
>
>Thanks,
>Dean




--
Tyler Hobbs
DataStax
<= br> --20cf307ac79b0b7a9e04c7035b38--