incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Kutcharian <>
Subject Re: Document storage
Date Thu, 29 Mar 2012 18:05:23 GMT
Hi Ben,

Sure, there's nothing really to it, but I'll email it to you. As far as why I'm using Snappy
on the type instead of sstable_compression is because when you set sstable_compression the
compression happens on the Cassandra nodes and I see two advantages with my approach:

1. Saving extra CPU usage on the Cassandra nodes. Since compression/decompression can easily
be done on the client nodes where there is plenty idle CPU time

2. Saving network bandwidth since you're sending over a compressed byte[]

One thing to note about my approach is that when I define the schema in Cassandra, I define
the columns as byte[] and not my custom type and I do all the conversion on the client side.

-- Drew

On Mar 29, 2012, at 12:04 AM, Ben McCann wrote:

> Sounds awesome Drew.  Mind sharing your custom type?  I just wrote a basic
> JSON type and did the validation the same way you did, but I don't have any
> SMILE support yet.  It seems that if your type were committed to the
> Cassandra codebase then the issue you ran into of the CLI only supporting
> built-in types would no longer be a problem for you (though fixing the
> issue anyway would be good and I voted for it).  Btw, any reason you
> compress it with Snappy yourself instead of just setting sstable_compression
> to SnappyCompressor<>and
> letting Cassandra do that part?
> -Ben
> On Wed, Mar 28, 2012 at 11:28 PM, Drew Kutcharian <> wrote:
>> I'm actually doing something almost the same. I serialize my objects into
>> byte[] using Jackson's SMILE format, then compress it using Snappy then
>> store the byte[] in Cassandra. I actually created a simple Cassandra Type
>> for this but I hit a wall with cassandra-cli:
>> Please vote on the JIRA if you are interested.
>> Validation is pretty simple, you just need to read the value and parse it
>> using Jackson, if you don't get any exceptions you're JSON/Smile is valid ;)
>> -- Drew
>> On Mar 28, 2012, at 9:28 PM, Ben McCann wrote:
>>> I don't imagine sort is a meaningful operation on JSON data.  As long as
>>> the sorting is consistent I would think that should be sufficient.
>>> On Wed, Mar 28, 2012 at 8:51 PM, Edward Capriolo <
>>> wrote:
>>>> Some work I did stores JSON blobs in columns. The question on JSON
>>>> type is how to sort it.
>>>> On Wed, Mar 28, 2012 at 7:35 PM, Jeremy Hanna
>>>> <> wrote:
>>>>> I don't speak for the project, but you might give it a day or two for
>>>> people to respond and/or perhaps create a jira ticket.  Seems like
>> that's a
>>>> reasonable data type that would get some traction - a json type.
>> However,
>>>> what would validation look like?  That's one of the main reasons there
>> are
>>>> the data types and validators, in order to validate on insert.
>>>>> On Mar 29, 2012, at 12:27 AM, Ben McCann wrote:
>>>>>> Any thoughts?  I'd like to submit a patch, but only if it will be
>>>> accepted.
>>>>>> Thanks,
>>>>>> Ben
>>>>>> On Wed, Mar 28, 2012 at 8:58 AM, Ben McCann <>
>> wrote:
>>>>>>> Hi,
>>>>>>> I was wondering if it would be interesting to add some type of
>>>>>>> document-oriented data type.
>>>>>>> I've found it somewhat awkward to store document-oriented data
>>>>>>> Cassandra today.  I can make a JSON/Protobuf/Thrift, serialize
>> and
>>>>>>> store it, but Cassandra cannot differentiate it from any other
>>>> or
>>>>>>> byte array.  However, if my column validation_class could be
>> JsonType
>>>>>>> that would allow tools to potentially do more interesting
>>>> introspection on
>>>>>>> the column value.  E.g. bug 3647<
>>>>>calls for
>> supporting
>>>> arbitrarily nested "documents" in CQL.  Running a
>>>>>>> query against the JSON column in Pig is possible as well, but
>> in
>>>> this
>>>>>>> use case it would be helpful to be able to encode in column metadata
>>>> that
>>>>>>> the column is stored as JSON.  For debugging, running nightly
>> reports,
>>>> etc.
>>>>>>> it would be quite useful compared to the opaque string and byte
>>>> types
>>>>>>> we have today.  JSON is appealing because it would be easy to
>>>> implement.
>>>>>>> Something like Thrift or Protocol Buffers would actually be
>> interesting
>>>>>>> since they would be more space efficient.  However, they would
>> be
>>>> a
>>>>>>> bit more difficult to implement because of the extra typing
>> information
>>>>>>> they provide.  I'm hoping with Cassandra 1.0's addition of
>> compression
>>>> that
>>>>>>> storing JSON is not too inefficient.
>>>>>>> Would there be interest in adding a JsonType?  I could look at
>> putting
>>>> a
>>>>>>> patch together.
>>>>>>> Thanks,
>>>>>>> Ben

View raw message