incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremiah Jordan <JEREMIAH.JOR...@morningstar.com>
Subject RE: Document storage
Date Thu, 29 Mar 2012 14:57:49 GMT
Its not clear what 3647 actually is, there is no code attached, and no real example in it.

Aside from that, the reason this would be useful to me (if we could get indexing of attributes
working), is that I already have my data in JSON/Thrift/ProtoBuff, depending how large the
data is, it isn't trivial to break it up into columns to insert, and re-assemble into columns
to read.  Also, until we get multiple slice range reads, I can't read two different structures
out of one row without getting all the other stuff between them, unless there are only two
columns and I read them using column names not slices.

As it is right now I have to maintain custom indexes on all my attributes to be able to put
ProtoBuff into


________________________________________
From: Jake Luciani [jakers@gmail.com]
Sent: Thursday, March 29, 2012 7:44 AM
To: dev@cassandra.apache.org
Subject: Re: Document storage

Is there a reason you would prefer a JSONType over CASSANDRA-3647?  It
would seem the only thing a JSON type offers you is validation.  3647 takes
it much further by deconstructing a JSON document using composite columns
to flatten the document out, with the ability to access and update portions
of the document (as well as reconstruct it).

On Wed, Mar 28, 2012 at 11:58 AM, Ben McCann <ben@benmccann.com> wrote:

> Hi,
>
> I was wondering if it would be interesting to add some type of
> document-oriented data type.
>
> I've found it somewhat awkward to store document-oriented data in Cassandra
> today.  I can make a JSON/Protobuf/Thrift, serialize it, and store it, but
> Cassandra cannot differentiate it from any other string or byte array.
>  However, if my column validation_class could be a JsonType that would
> allow tools to potentially do more interesting introspection on the column
> value.  E.g. bug 3647
> <https://issues.apache.org/jira/browse/CASSANDRA-3647>calls for
> supporting arbitrarily nested "documents" in CQL.  Running a
> query against the JSON column in Pig is possible as well, but again in this
> use case it would be helpful to be able to encode in column metadata that
> the column is stored as JSON.  For debugging, running nightly reports, etc.
> it would be quite useful compared to the opaque string and byte array types
> we have today.  JSON is appealing because it would be easy to implement.
>  Something like Thrift or Protocol Buffers would actually be interesting
> since they would be more space efficient.  However, they would also be a
> bit more difficult to implement because of the extra typing information
> they provide.  I'm hoping with Cassandra 1.0's addition of compression that
> storing JSON is not too inefficient.
>
> Would there be interest in adding a JsonType?  I could look at putting a
> patch together.
>
> Thanks,
> Ben
>



--
http://twitter.com/tjake

Mime
View raw message