incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Performance problem with large wide row inserts using CQL
Date Fri, 21 Feb 2014 00:37:23 GMT
Recomendations in cassandra have a shelf life of about 1 to 2 years. If you
try to assert a recomendation from year ago you stand a solid chance of
someone telling you there is now a better way.

Casaandra once loved being a schemaless datastore. Imagine that?


On Thursday, February 20, 2014, Peter Lin <woolfel@gmail.com> wrote:
>
> good example Ed.
>
> I'm so happy to see other people doing things like this. Even if the
official DataStax docs recommend don't mix static and dynamic, to me that's
a huge disservice to Cassandra users.
>
> If someone really wants to stick to relational model, then NewSql is a
better fit, plus gives users the full power of SQL with subqueries, like,
and joins. NewSql can't handle these kinds of use cases due to static
nature of relational tables, row size limit and column limit.
>
>
>
> On Thu, Feb 20, 2014 at 6:18 PM, Edward Capriolo <edlinuxguru@gmail.com>
wrote:
>
> CASSANDRA-6561 is interesting. Though having statically defined columns
are not exactly a solution to do everything in "thrift".
>
>
http://planetcassandra.org/blog/post/poking-around-with-an-idea-ranged-metadata/
>
> Before collections or CQL existed I did some of these concepts myself.
>
> Say you have a column family named AllMyStuff
>
> columns named "friends_" would be a string and they would be a "Map" of
friends to age
>
> set AllMySuff[edward][friends_bob]=34
>
> set AllMySuff[edward][friends_sara]=33
>
> Column name password could be a string
>
> set AllMySuff[edward][password]='mother'
>
> Columns named phone[00] phone[100] would be an array of phone numbers
>
> set AllMySuff[edward][phone[00]]=555-5555'
>
> It was quite easy for me to slice all the phone numbers
>
> startkey: phone
> endkey: phone[100]
>
> But then every column starting with "action_xxxx" could be a page hit and
i could have thousands / ten thousands of these
>
> In many cases CQL has nice/nicer abstractions for some of these things.
But its largest detraction for me is that I can not take this already
existing column family AllMyStuff and 'explain' it to CQL. Its a perfectly
valid way to design something, and might be (probably) is more space
efficient then the system of using composites CQL uses to pack things. I
feel that as a data access language it dictates too much schema, not only
what is in row schema, but it controls the format of the data on disk as
well. Also schema's like mine above are very valid but selecting them into
a table of fixed rows and columns does not map well.
>
> The way hive handles tackles this problem, is that the metadata is
interpreted by a SerDe so that the physical data and the logical definition
are not coupled.
>
>
>
>
> On Thu, Feb 20, 2014 at 5:23 PM, DuyHai Doan <doanduyhai@gmail.com> wrote:
>
> RĂ¼diger
>
> "SortedMap<byte[], SortedMap<byte[], Pair<Long, byte[]>>"
>
>  When using a RandomPartitioner or Murmur3Partitioner, the outer map is a
simple Map, not SortedMap.
>
>  The only case you have a SortedMap for row key is when using
OrderPreservingPartitioner, which is clearly not advised for most cases
because of hot spots in the cluster.
>
>
>
> On Thu, Feb 2

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Mime
View raw message