incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <>
Subject Re: Document storage
Date Thu, 29 Mar 2012 03:51:07 GMT
Some work I did stores JSON blobs in columns. The question on JSON
type is how to sort it.

On Wed, Mar 28, 2012 at 7:35 PM, Jeremy Hanna
<> wrote:
> I don't speak for the project, but you might give it a day or two for people to respond
and/or perhaps create a jira ticket.  Seems like that's a reasonable data type that would
get some traction - a json type.  However, what would validation look like?  That's one
of the main reasons there are the data types and validators, in order to validate on insert.
> On Mar 29, 2012, at 12:27 AM, Ben McCann wrote:
>> Any thoughts?  I'd like to submit a patch, but only if it will be accepted.
>> Thanks,
>> Ben
>> On Wed, Mar 28, 2012 at 8:58 AM, Ben McCann <> wrote:
>>> Hi,
>>> I was wondering if it would be interesting to add some type of
>>> document-oriented data type.
>>> I've found it somewhat awkward to store document-oriented data in
>>> Cassandra today.  I can make a JSON/Protobuf/Thrift, serialize it, and
>>> store it, but Cassandra cannot differentiate it from any other string or
>>> byte array.  However, if my column validation_class could be a JsonType
>>> that would allow tools to potentially do more interesting introspection on
>>> the column value.  E.g. bug 3647<>calls
for supporting arbitrarily nested "documents" in CQL.  Running a
>>> query against the JSON column in Pig is possible as well, but again in this
>>> use case it would be helpful to be able to encode in column metadata that
>>> the column is stored as JSON.  For debugging, running nightly reports, etc.
>>> it would be quite useful compared to the opaque string and byte array types
>>> we have today.  JSON is appealing because it would be easy to implement.
>>> Something like Thrift or Protocol Buffers would actually be interesting
>>> since they would be more space efficient.  However, they would also be a
>>> bit more difficult to implement because of the extra typing information
>>> they provide.  I'm hoping with Cassandra 1.0's addition of compression that
>>> storing JSON is not too inefficient.
>>> Would there be interest in adding a JsonType?  I could look at putting a
>>> patch together.
>>> Thanks,
>>> Ben

View raw message