Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4EC8C109CD for ; Thu, 18 Jul 2013 23:19:58 +0000 (UTC) Received: (qmail 43552 invoked by uid 500); 18 Jul 2013 23:19:55 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 43490 invoked by uid 500); 18 Jul 2013 23:19:55 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 43481 invoked by uid 99); 18 Jul 2013 23:19:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jul 2013 23:19:55 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [128.210.5.249] (HELO mailhub249.itcs.purdue.edu) (128.210.5.249) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jul 2013 23:19:49 +0000 Received: from mail-ve0-f170.google.com (mail-ve0-f170.google.com [209.85.128.170]) (authenticated bits=0) by mailhub249.itcs.purdue.edu (8.14.4/8.14.4/mta-auth.smtp.purdue.edu) with ESMTP id r6INJRHg024687 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Thu, 18 Jul 2013 19:19:28 -0400 Received: by mail-ve0-f170.google.com with SMTP id 14so3006305vea.15 for ; Thu, 18 Jul 2013 16:19:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=lqp5KxRn3nizWsAHHe8ANGkwYWDhMiAlNCp5q5GVWOg=; b=AynBE/nkH+TRxmemzn60BQKd0jZzMbbhO1tALLNk4ZAvIMpn0ybvnsnwqFZYCUFnUO YnzOaasJt+52DWS5sBa4ldXvszUdMxjea0wSDV95ybAsX7SqaVmMi8d9M4YagjV/DjPe Vj7UlXPWdk6APkKQO3grA499HdB4dXWsXBC1g+mbd6pxMWesMRSNl7Z1umc3OFeObpHO oZ9VofDI1iF6P0dYBz4jvcstc2Ox36Z4pGiayg0UKYXYWsXZsB+Kqv/c1y7WwnZIY6ux 9v+8BwtZygBA9hGa3F8aVF5Bd3N0V1E8VBSWbdvpDJuDlaSDPmXmqD74TBJz8350EwwY 0T5Q== X-Received: by 10.58.152.3 with SMTP id uu3mr4901800veb.16.1374189567699; Thu, 18 Jul 2013 16:19:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.229.164 with HTTP; Thu, 18 Jul 2013 16:18:47 -0700 (PDT) In-Reply-To: References: <1374181399357-7589141.post@n2.nabble.com> From: Mohammad Hajjat Date: Thu, 18 Jul 2013 19:18:47 -0400 Message-ID: Subject: Re: Recommended data size for Reads/Writes in Cassandra To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b86f5320974df04e1d17178 X-PMX-Version: 6.0.2.2308539 X-PerlMx-Virus-Scanned: Yes X-Virus-Checked: Checked by ClamAV on apache.org --047d7b86f5320974df04e1d17178 Content-Type: text/plain; charset=UTF-8 Thanks Andrey and Tyler! That was useful :) Do you guys have any idea why the 10 MB writes took a lot of time in my case although I'm using Large VMs which have plenty of resources? Or do you think this latency is expected? I'm trying to see how much time is spent in the network versus processing CPU cycles of the nodes; any suggestion for a good profiling tool? On Thu, Jul 18, 2013 at 5:50 PM, Tyler Hobbs wrote: > The default limit is 16mb, but realistically you should try to keep writes > under 10mb, breaking up large values into multiple columns/rows if > necessary. > > > On Thu, Jul 18, 2013 at 4:31 PM, Andrey Ilinykh wrote: > >> there is a limit of thrift message ( thrift_max_message_length_in_mb), by >> default it is 64m if I'm not mistaken. This is your limit. >> >> >> On Thu, Jul 18, 2013 at 2:03 PM, hajjat wrote: >> >>> Hi, >>> >>> Is there a recommended data size for Reads/Writes in Cassandra? I tried >>> inserting 10 MB objects and the latency I got was pretty high. Also, I >>> was >>> never able to insert larger objects (say 50 MB) since Cassandra kept >>> crashing when I tried that. >>> >>> Here is my experiment setup: >>> I used two Large VMs in EC2 within the same data-center. Inserts have ALL >>> consistency (strong consistency). The latencies were as follows: >>> Data size: 10 MB 1 MB 100 Bytes >>> Latency: 250ms 50ms 8ms >>> >>> I've also done the same for two Large VMs across two data-centers. The >>> latencies were around: >>> Data size: 10 MB 1 MB 100 Bytes >>> Latency: 1200ms 800ms 80ms >>> >>> 1) Ain't the 10 MB latency extremely high? >>> 2) Is there a recommended data size to use with Cassandra (e.g., a few >>> bytes >>> up to 1 MB)? >>> 3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does >>> anybody know why? I thought the max data size should be up to 2 GB? >>> >>> Thanks, >>> Mohammad >>> >>> PS. Here is my python code I use to insert into Cassandra. I put my >>> stopwatch timers around the insert statement: >>> fh = open(TEST_FILE,'r') >>> data = str(fh.read()) >>> >>> POOL = ConnectionPool(keyspace, server_list=['localhost:9160'], >>> timeout=None) >>> USER = ColumnFamily(POOL, 'User') >>> USER.insert('Ali', {'data': >>> >>> data},write_consistency_level=pycassa.cassandra.ttypes.ConsistencyLevel.ALL) >>> >>> >>> >>> >>> -- >>> View this message in context: >>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-Cassandra-tp7589141.html >>> Sent from the cassandra-user@incubator.apache.org mailing list archive >>> at Nabble.com. >>> >> >> > > > -- > Tyler Hobbs > DataStax > -- *Mohammad Hajjat* *Ph.D. Student* *Electrical and Computer Engineering* *Purdue University* --047d7b86f5320974df04e1d17178 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Thanks Andrey and Tyler! That was useful :)

Do you guys have any idea why the 10 MB writes took a lot of time in my cas= e although I'm using Large VMs which have plenty of resources? Or do yo= u think this latency is expected?
I'm trying to see how much time is spent in the network versus processi= ng CPU cycles of the nodes; any suggestion for a good profiling tool?
=



On Thu, Jul 1= 8, 2013 at 5:50 PM, Tyler Hobbs <tyler@datastax.com> wrote:=
The default limit is 16mb, = but realistically you should try to keep writes under 10mb, breaking up lar= ge values into multiple columns/rows if necessary.


On Thu, Jul 18, 2013 at 4:31 PM, Andrey Ilinykh <ailinykh@gmail.com&g= t; wrote:
there is a limit of thrift message ( thrift_max_message_le= ngth_in_mb), by default it is 64m if I'm not mistaken. This is your lim= it.


On Thu, Jul 18, 2013 at 2:03 PM, hajjat = <hajjat@purdue.edu> wrote:
Hi,

Is there a recommended data size for Reads/Writes in Cassandra? I tried
inserting 10 MB objects and the latency I got was pretty high. Also, I was<= br> never able to insert larger objects (say 50 MB) since Cassandra kept
crashing when I tried that.

Here is my experiment setup:
I used two Large VMs in EC2 within the same data-center. Inserts have ALL consistency (strong consistency). =C2=A0The latencies were as follows:
Data size: =C2=A0 =C2=A0 =C2=A010 MB =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 M= B =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0100 Bytes
Latency: =C2=A0 =C2=A0 =C2=A0 =C2=A0250ms =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 50ms =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A08ms

I've also done the same for two Large VMs across two data-centers. The<= br> latencies were around:
Data size: =C2=A0 =C2=A0 =C2=A010 MB =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 1 M= B =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0100 Bytes
Latency: =C2=A0 =C2=A0 =C2=A0 =C2=A01200ms =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0800ms =C2=A0 80ms

1) Ain't the 10 MB latency extremely high?
2) Is there a recommended data size to use with Cassandra (e.g., a few byte= s
up to 1 MB)?
3) Also, I tried inserting 50 MB data but Cassandra kept crashing. Does
anybody know why? I thought the max data size should be up to 2 GB?

Thanks,
Mohammad

PS. Here is my python code I use to insert into Cassandra. I put my
stopwatch timers around the insert statement:
=C2=A0 =C2=A0 fh =3D open(TEST_FILE,'r')
=C2=A0 =C2=A0 data =3D str(fh.read())

=C2=A0 =C2=A0 POOL =3D ConnectionPool(keyspace, server_list=3D['localho= st:9160'],
timeout=3DNone)
=C2=A0 =C2=A0 USER =3D ColumnFamily(POOL, 'User')
=C2=A0 =C2=A0 USER.insert('Ali', {'data':
data},write_consistency_level=3Dpycassa.cassandra.ttypes.ConsistencyLevel.A= LL)




--
View this message in context: http://cassandra-user-incubator-a= pache-org.3065146.n2.nabble.com/Recommended-data-size-for-Reads-Writes-in-C= assandra-tp7589141.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at= Nabble.com.




= --
Tyler Hobbs
DataStax



--
Mohammad Hajjat=
Ph.D. Student
Electrical and Computer Engineering
Purdue University
<= /font>
--047d7b86f5320974df04e1d17178--