incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben McCann <...@benmccann.com>
Subject Re: Document storage
Date Thu, 29 Mar 2012 19:06:25 GMT
Jonathan, I asked Brian about his REST
API<https://groups.google.com/forum/?fromgroups#!topic/virgil-users/oncBas9C8Us>and
he said he does not take the json objects and split them because the
client libraries do not agree on implementations.  This was exactly my
concern as well with this solution.  I would be perfectly happy to do it
this way instead of using JSON if it were standardized.  The reason I
suggested JSON is that it is standardized.  As far as I can tell, Cassandra
doesn't support maps and lists in a standardized way today, which is the
root of my problem.

-Ben


On Thu, Mar 29, 2012 at 11:30 AM, Drew Kutcharian <drew@venarc.com> wrote:

> Yes, I meant the "row header index". What I have done is that I'm storing
> an object (i.e. UserProfile) where you read or write it as a whole (a user
> updates their user details in a single page in the UI). So I serialize that
> object into a binary JSON using SMILE format. I then compress it using
> Snappy on the client side. So as far as Cassandra cares it's storing a
> byte[].
>
> Now on the client side, I'm using cassandra-cli with a custom type that
> knows how to turn a byte[] into a JSON text and back. The only issue was
> CASSANDRA-4081 where "assume" doesn't work with custom types. If
> CASSANDRA-4081 gets fixed, I'll get the best of both worlds.
>
> Also advantages of this vs. the thrift based Super Column families are:
>
> 1. Saving extra CPU usage on the Cassandra nodes. Since
> serialize/deserialize and compression/decompression happens on the client
> nodes where there is plenty idle CPU time
>
> 2. Saving network bandwidth since I'm sending over a compressed byte[]
>
>
> -- Drew
>
>
>
> On Mar 29, 2012, at 11:16 AM, Jonathan Ellis wrote:
>
> > On Thu, Mar 29, 2012 at 1:11 PM, Drew Kutcharian <drew@venarc.com>
> wrote:
> >>> I think this is a much better approach because that gives you the
> >>> ability to update or retrieve just parts of objects efficiently,
> >>> rather than making column values just blobs with a bunch of special
> >>> case logic to introspect them.  Which feels like a big step backwards
> >>> to me.
> >>
> >> Unless your access pattern involves reading/writing the whole document
> each time. In that case you're better off serializing the whole document
> and storing it in a column as a byte[] without incurring the overhead of
> column indexes. Right?
> >
> > Hmm, not sure what you're thinking of there.
> >
> > If you mean the "index" that's part of the row header for random
> > access within a row, then no, serializing to byte[] doesn't save you
> > anything.
> >
> > If you mean secondary indexes, don't declare any if you don't want any.
> :)
> >
> > Just telling C* to store a byte[] *will* be slightly lighter-weight
> > than giving it named columns, but we're talking negligible compared to
> > the overhead of actually moving the data on or off disk in the first
> > place.  Not even close to being worth giving up being able to deal
> > with your data from standard tools like cqlsh, IMO.
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of DataStax, the source for professional Cassandra support
> > http://www.datastax.com
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message