incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Kutcharian <>
Subject Re: Document storage
Date Thu, 29 Mar 2012 06:28:20 GMT
I'm actually doing something almost the same. I serialize my objects into byte[] using Jackson's
SMILE format, then compress it using Snappy then store the byte[] in Cassandra. I actually
created a simple Cassandra Type for this but I hit a wall with cassandra-cli:

Please vote on the JIRA if you are interested.

Validation is pretty simple, you just need to read the value and parse it using Jackson, if
you don't get any exceptions you're JSON/Smile is valid ;)

-- Drew

On Mar 28, 2012, at 9:28 PM, Ben McCann wrote:

> I don't imagine sort is a meaningful operation on JSON data.  As long as
> the sorting is consistent I would think that should be sufficient.
> On Wed, Mar 28, 2012 at 8:51 PM, Edward Capriolo <>wrote:
>> Some work I did stores JSON blobs in columns. The question on JSON
>> type is how to sort it.
>> On Wed, Mar 28, 2012 at 7:35 PM, Jeremy Hanna
>> <> wrote:
>>> I don't speak for the project, but you might give it a day or two for
>> people to respond and/or perhaps create a jira ticket.  Seems like that's a
>> reasonable data type that would get some traction - a json type.  However,
>> what would validation look like?  That's one of the main reasons there are
>> the data types and validators, in order to validate on insert.
>>> On Mar 29, 2012, at 12:27 AM, Ben McCann wrote:
>>>> Any thoughts?  I'd like to submit a patch, but only if it will be
>> accepted.
>>>> Thanks,
>>>> Ben
>>>> On Wed, Mar 28, 2012 at 8:58 AM, Ben McCann <> wrote:
>>>>> Hi,
>>>>> I was wondering if it would be interesting to add some type of
>>>>> document-oriented data type.
>>>>> I've found it somewhat awkward to store document-oriented data in
>>>>> Cassandra today.  I can make a JSON/Protobuf/Thrift, serialize it, and
>>>>> store it, but Cassandra cannot differentiate it from any other string
>> or
>>>>> byte array.  However, if my column validation_class could be a JsonType
>>>>> that would allow tools to potentially do more interesting
>> introspection on
>>>>> the column value.  E.g. bug 3647<
>>>calls for supporting
>> arbitrarily nested "documents" in CQL.  Running a
>>>>> query against the JSON column in Pig is possible as well, but again in
>> this
>>>>> use case it would be helpful to be able to encode in column metadata
>> that
>>>>> the column is stored as JSON.  For debugging, running nightly reports,
>> etc.
>>>>> it would be quite useful compared to the opaque string and byte array
>> types
>>>>> we have today.  JSON is appealing because it would be easy to
>> implement.
>>>>> Something like Thrift or Protocol Buffers would actually be interesting
>>>>> since they would be more space efficient.  However, they would also be
>> a
>>>>> bit more difficult to implement because of the extra typing information
>>>>> they provide.  I'm hoping with Cassandra 1.0's addition of compression
>> that
>>>>> storing JSON is not too inefficient.
>>>>> Would there be interest in adding a JsonType?  I could look at putting
>> a
>>>>> patch together.
>>>>> Thanks,
>>>>> Ben

View raw message