incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Lin <wool...@gmail.com>
Subject Re: Performance problem with large wide row inserts using CQL
Date Fri, 21 Feb 2014 00:45:13 GMT

Yeah

Slowly nosql products are adding schema :) 

At least Cassandra is ahead of the curve

Sent from my iPhone

> On Feb 20, 2014, at 7:37 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
> 
> Recomendations in cassandra have a shelf life of about 1 to 2 years. If you try to assert
a recomendation from year ago you stand a solid chance of someone telling you there is now
a better way.
> 
> Casaandra once loved being a schemaless datastore. Imagine that?
> 
> 
> On Thursday, February 20, 2014, Peter Lin <woolfel@gmail.com> wrote:
> >
> > good example Ed.
> >
> > I'm so happy to see other people doing things like this. Even if the official DataStax
docs recommend don't mix static and dynamic, to me that's a huge disservice to Cassandra users.
> >
> > If someone really wants to stick to relational model, then NewSql is a better fit,
plus gives users the full power of SQL with subqueries, like, and joins. NewSql can't handle
these kinds of use cases due to static nature of relational tables, row size limit and column
limit.
> >
> >
> >
> > On Thu, Feb 20, 2014 at 6:18 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
> >
> > CASSANDRA-6561 is interesting. Though having statically defined columns are not
exactly a solution to do everything in "thrift".
> >
> > http://planetcassandra.org/blog/post/poking-around-with-an-idea-ranged-metadata/
> >
> > Before collections or CQL existed I did some of these concepts myself.
> >
> > Say you have a column family named AllMyStuff
> >
> > columns named "friends_" would be a string and they would be a "Map" of friends
to age
> >
> > set AllMySuff[edward][friends_bob]=34
> >
> > set AllMySuff[edward][friends_sara]=33
> >
> > Column name password could be a string
> >
> > set AllMySuff[edward][password]='mother'
> >
> > Columns named phone[00] phone[100] would be an array of phone numbers
> >
> > set AllMySuff[edward][phone[00]]=555-5555'
> >
> > It was quite easy for me to slice all the phone numbers
> >
> > startkey: phone
> > endkey: phone[100]
> >
> > But then every column starting with "action_xxxx" could be a page hit and i could
have thousands / ten thousands of these
> >
> > In many cases CQL has nice/nicer abstractions for some of these things. But its
largest detraction for me is that I can not take this already existing column family AllMyStuff
and 'explain' it to CQL. Its a perfectly valid way to design something, and might be (probably)
is more space efficient then the system of using composites CQL uses to pack things. I feel
that as a data access language it dictates too much schema, not only what is in row schema,
but it controls the format of the data on disk as well. Also schema's like mine above are
very valid but selecting them into a table of fixed rows and columns does not map well.
> >
> > The way hive handles tackles this problem, is that the metadata is interpreted by
a SerDe so that the physical data and the logical definition are not coupled.
> >
> >
> >
> >
> > On Thu, Feb 20, 2014 at 5:23 PM, DuyHai Doan <doanduyhai@gmail.com> wrote:
> >
> > RĂ¼diger
> >
> > "SortedMap<byte[], SortedMap<byte[], Pair<Long, byte[]>>"
> >
> >  When using a RandomPartitioner or Murmur3Partitioner, the outer map is a simple
Map, not SortedMap.
> >
> >  The only case you have a SortedMap for row key is when using OrderPreservingPartitioner,
which is clearly not advised for most cases because of hot spots in the cluster.
> >
> >
> >
> > On Thu, Feb 2
> 
> -- 
> Sorry this was sent from mobile. Will do less grammar and spell check than usual.

Mime
View raw message