Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 545F111524 for ; Fri, 21 Feb 2014 00:45:54 +0000 (UTC) Received: (qmail 45428 invoked by uid 500); 21 Feb 2014 00:45:46 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 45358 invoked by uid 500); 21 Feb 2014 00:45:45 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 45323 invoked by uid 99); 21 Feb 2014 00:45:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Feb 2014 00:45:43 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,MIME_QP_LONG_LINE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of woolfel@gmail.com designates 209.85.216.178 as permitted sender) Received: from [209.85.216.178] (HELO mail-qc0-f178.google.com) (209.85.216.178) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 21 Feb 2014 00:45:36 +0000 Received: by mail-qc0-f178.google.com with SMTP id m20so4681994qcx.37 for ; Thu, 20 Feb 2014 16:45:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:references:from:content-type:in-reply-to:message-id:date:to :content-transfer-encoding:mime-version; bh=6WiRPz3B6+QfqQl59vCejc2Kvvp3u0ob5Imj5ZaJA2o=; b=PxHl+fHgB99/G7DpI3b9W37aswBQnB8dHam2kCP1wulbOShhG+ea60wLk2NT63iEFW xzxLEboSQUvZv52+lFHJh1MmqIjjinKBPd3SkTQ30URpnTcpQbxuownceqlukXLImGIP XX55mUYf7w4K5b/sXpEv5Rt1Y8JktXo+Kmh98/3i/rd0tWyzJqNqzkqgPL4wRxxnlK90 n0FauZkG78xvqGrURPQsTv+Df/E75dkBtewgQA8kerR1yQyv+9zwRva7cN1cVwaydUm6 2Hkl1bVCqHNF8rNnkUigCuhokaSoD2gy2PNYgEnbEbJIO7hIjB2n9jGuvHQIMALJUlT6 hQZg== X-Received: by 10.229.193.136 with SMTP id du8mr5929279qcb.11.1392943515437; Thu, 20 Feb 2014 16:45:15 -0800 (PST) Received: from [192.168.1.45] (96-32-117-186.static.oxfr.ma.charter.com. [96.32.117.186]) by mx.google.com with ESMTPSA id o75sm10076636qgd.11.2014.02.20.16.45.14 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 20 Feb 2014 16:45:14 -0800 (PST) Subject: Re: Performance problem with large wide row inserts using CQL References: From: Peter Lin Content-Type: multipart/alternative; boundary=Apple-Mail-69ABD34A-8797-404F-9E7A-1A504303A269 X-Mailer: iPhone Mail (11B554a) In-Reply-To: Message-Id: Date: Thu, 20 Feb 2014 19:45:13 -0500 To: "user@cassandra.apache.org" Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (1.0) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-69ABD34A-8797-404F-9E7A-1A504303A269 Content-Type: text/plain; charset=gb2312 Content-Transfer-Encoding: quoted-printable Yeah Slowly nosql products are adding schema :)=20 At least Cassandra is ahead of the curve Sent from my iPhone > On Feb 20, 2014, at 7:37 PM, Edward Capriolo wrote= : >=20 > Recomendations in cassandra have a shelf life of about 1 to 2 years. If yo= u try to assert a recomendation from year ago you stand a solid chance of so= meone telling you there is now a better way. >=20 > Casaandra once loved being a schemaless datastore. Imagine that? >=20 >=20 > On Thursday, February 20, 2014, Peter Lin wrote: > > > > good example Ed. > > > > I'm so happy to see other people doing things like this. Even if the off= icial DataStax docs recommend don't mix static and dynamic, to me that's a h= uge disservice to Cassandra users. > > > > If someone really wants to stick to relational model, then NewSql is a b= etter fit, plus gives users the full power of SQL with subqueries, like, and= joins. NewSql can't handle these kinds of use cases due to static nature of= relational tables, row size limit and column limit. > > > > > > > > On Thu, Feb 20, 2014 at 6:18 PM, Edward Capriolo = wrote: > > > > CASSANDRA-6561 is interesting. Though having statically defined columns a= re not exactly a solution to do everything in "thrift". > > > > http://planetcassandra.org/blog/post/poking-around-with-an-idea-ranged-m= etadata/ > > > > Before collections or CQL existed I did some of these concepts myself. > > > > Say you have a column family named AllMyStuff > > > > columns named "friends_" would be a string and they would be a "Map" of f= riends to age > > > > set AllMySuff[edward][friends_bob]=3D34 > > > > set AllMySuff[edward][friends_sara]=3D33 > > > > Column name password could be a string > > > > set AllMySuff[edward][password]=3D'mother' > > > > Columns named phone[00] phone[100] would be an array of phone numbers > > > > set AllMySuff[edward][phone[00]]=3D555-5555' > > > > It was quite easy for me to slice all the phone numbers > > > > startkey: phone > > endkey: phone[100] > > > > But then every column starting with "action_xxxx" could be a page hit an= d i could have thousands / ten thousands of these > > > > In many cases CQL has nice/nicer abstractions for some of these things. B= ut its largest detraction for me is that I can not take this already existin= g column family AllMyStuff and 'explain' it to CQL. Its a perfectly valid wa= y to design something, and might be (probably) is more space efficient then t= he system of using composites CQL uses to pack things. I feel that as a data= access language it dictates too much schema, not only what is in row schema= , but it controls the format of the data on disk as well. Also schema's like= mine above are very valid but selecting them into a table of fixed rows and= columns does not map well. > > > > The way hive handles tackles this problem, is that the metadata is inter= preted by a SerDe so that the physical data and the logical definition are n= ot coupled. > > > > > > > > > > On Thu, Feb 20, 2014 at 5:23 PM, DuyHai Doan wrot= e: > > > > R=A8=B9diger > > > > "SortedMap>" > > > > When using a RandomPartitioner or Murmur3Partitioner, the outer map is a= simple Map, not SortedMap. > > > > The only case you have a SortedMap for row key is when using OrderPrese= rvingPartitioner, which is clearly not advised for most cases because of hot= spots in the cluster. > > > > > > > > On Thu, Feb 2 >=20 > --=20 > Sorry this was sent from mobile. Will do less grammar and spell check than= usual. --Apple-Mail-69ABD34A-8797-404F-9E7A-1A504303A269 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable

Yeah

Slowly nosql products are adding schema :) 

=
At least Cassandra is ahead of the curve

Sent from my iPhone

On Feb 20, 2014, at 7:37 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:

<= blockquote type=3D"cite">
Recomendations in cassandra have a shelf life o= f about 1 to 2 years. If you try to assert a recomendation from year ago you= stand a solid chance of someone telling you there is now a better way.
<= br>Casaandra once loved being a schemaless datastore. Imagine that?


On Thursday, February 20, 2014, Peter Lin <woolfel@gmail.com> wrote:
>
> good example= Ed.
>
> I'm so happy to see other people doing things like this= . Even if the official DataStax docs recommend don't mix static and dynamic,= to me that's a huge disservice to Cassandra users.
>
> If someone really wants to stick to relational model, then NewS= ql is a better fit, plus gives users the full power of SQL with subqueries, l= ike, and joins. NewSql can't handle these kinds of use cases due to static n= ature of relational tables, row size limit and column limit.
>
>
>
> On Thu, Feb 20, 2014 at 6:18 PM, Edward Capriol= o <edlinuxguru@gmail.com>= wrote:
>
> CASSANDRA-6561 is interesting. Though having statica= lly defined columns are not exactly a solution to do everything in "thrift".=
>
> http://planetcassandra.org/blog/post/poking-ar= ound-with-an-idea-ranged-metadata/
>
> Before collections or= CQL existed I did some of these concepts myself.
>
> Say you have a column family named AllMyStuff
>
> c= olumns named "friends_" would be a string and they would be a "Map" of frien= ds to age
>
> set AllMySuff[edward][friends_bob]=3D34
>
> set AllMySuff[edward][friends_sara]=3D33
>
> Column= name password could be a string
>
> set AllMySuff[edward][passw= ord]=3D'mother'
>
> Columns named phone[00] phone[100] would be a= n array of phone numbers
>
> set AllMySuff[edward][phone[00]]=3D555-5555'
>
> It= was quite easy for me to slice all the phone numbers
>
> startk= ey: phone
> endkey: phone[100]
>
> But then every column s= tarting with "action_xxxx" could be a page hit and i could have thousands / t= en thousands of these
>
> In many cases CQL has nice/nicer abstractions for some of these= things. But its largest detraction for me is that I can not take this alrea= dy existing column family AllMyStuff and 'explain' it to CQL. Its a perfectl= y valid way to design something, and might be (probably) is more space effic= ient then the system of using composites CQL uses to pack things. I feel tha= t as a data access language it dictates too much schema, not only what is in= row schema, but it controls the format of the data on disk as well. Also sc= hema's like mine above are very valid but selecting them into a table of fix= ed rows and columns does not map well.
>
> The way hive handles tackles this problem, is that the metadata= is interpreted by a SerDe so that the physical data and the logical definit= ion are not coupled.
>
>
>
>
> On Thu, Feb 20,= 2014 at 5:23 PM, DuyHai Doan <do= anduyhai@gmail.com> wrote:
>
> R=C3=BCdiger
>
> "SortedMap<byte[], SortedMap<= ;byte[], Pair<Long, byte[]>>"
>
>  When using a Ra= ndomPartitioner or Murmur3Partitioner, the outer map is a simple Map, not So= rtedMap.
>
>  The only case you have a SortedMap for row key is when us= ing OrderPreservingPartitioner, which is clearly not advised for most cases b= ecause of hot spots in the cluster.
>
>
>
> On Thu, = Feb 2

--
Sorry this was sent from mobile. Will do less grammar and spell c= heck than usual.
= --Apple-Mail-69ABD34A-8797-404F-9E7A-1A504303A269--