incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben McCann <...@benmccann.com>
Subject Re: Document storage
Date Thu, 29 Mar 2012 07:04:34 GMT
Sounds awesome Drew.  Mind sharing your custom type?  I just wrote a basic
JSON type and did the validation the same way you did, but I don't have any
SMILE support yet.  It seems that if your type were committed to the
Cassandra codebase then the issue you ran into of the CLI only supporting
built-in types would no longer be a problem for you (though fixing the
issue anyway would be good and I voted for it).  Btw, any reason you
compress it with Snappy yourself instead of just setting sstable_compression
to SnappyCompressor<http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression>and
letting Cassandra do that part?

-Ben


On Wed, Mar 28, 2012 at 11:28 PM, Drew Kutcharian <drew@venarc.com> wrote:

> I'm actually doing something almost the same. I serialize my objects into
> byte[] using Jackson's SMILE format, then compress it using Snappy then
> store the byte[] in Cassandra. I actually created a simple Cassandra Type
> for this but I hit a wall with cassandra-cli:
>
> https://issues.apache.org/jira/browse/CASSANDRA-4081
>
> Please vote on the JIRA if you are interested.
>
> Validation is pretty simple, you just need to read the value and parse it
> using Jackson, if you don't get any exceptions you're JSON/Smile is valid ;)
>
> -- Drew
>
>
>
> On Mar 28, 2012, at 9:28 PM, Ben McCann wrote:
>
> > I don't imagine sort is a meaningful operation on JSON data.  As long as
> > the sorting is consistent I would think that should be sufficient.
> >
> >
> > On Wed, Mar 28, 2012 at 8:51 PM, Edward Capriolo <edlinuxguru@gmail.com
> >wrote:
> >
> >> Some work I did stores JSON blobs in columns. The question on JSON
> >> type is how to sort it.
> >>
> >> On Wed, Mar 28, 2012 at 7:35 PM, Jeremy Hanna
> >> <jeremy.hanna1234@gmail.com> wrote:
> >>> I don't speak for the project, but you might give it a day or two for
> >> people to respond and/or perhaps create a jira ticket.  Seems like
> that's a
> >> reasonable data type that would get some traction - a json type.
>  However,
> >> what would validation look like?  That's one of the main reasons there
> are
> >> the data types and validators, in order to validate on insert.
> >>>
> >>> On Mar 29, 2012, at 12:27 AM, Ben McCann wrote:
> >>>
> >>>> Any thoughts?  I'd like to submit a patch, but only if it will be
> >> accepted.
> >>>>
> >>>> Thanks,
> >>>> Ben
> >>>>
> >>>>
> >>>> On Wed, Mar 28, 2012 at 8:58 AM, Ben McCann <ben@benmccann.com>
> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I was wondering if it would be interesting to add some type of
> >>>>> document-oriented data type.
> >>>>>
> >>>>> I've found it somewhat awkward to store document-oriented data in
> >>>>> Cassandra today.  I can make a JSON/Protobuf/Thrift, serialize it,
> and
> >>>>> store it, but Cassandra cannot differentiate it from any other string
> >> or
> >>>>> byte array.  However, if my column validation_class could be a
> JsonType
> >>>>> that would allow tools to potentially do more interesting
> >> introspection on
> >>>>> the column value.  E.g. bug 3647<
> >> https://issues.apache.org/jira/browse/CASSANDRA-3647>calls for
> supporting
> >> arbitrarily nested "documents" in CQL.  Running a
> >>>>> query against the JSON column in Pig is possible as well, but again
> in
> >> this
> >>>>> use case it would be helpful to be able to encode in column metadata
> >> that
> >>>>> the column is stored as JSON.  For debugging, running nightly
> reports,
> >> etc.
> >>>>> it would be quite useful compared to the opaque string and byte
array
> >> types
> >>>>> we have today.  JSON is appealing because it would be easy to
> >> implement.
> >>>>> Something like Thrift or Protocol Buffers would actually be
> interesting
> >>>>> since they would be more space efficient.  However, they would also
> be
> >> a
> >>>>> bit more difficult to implement because of the extra typing
> information
> >>>>> they provide.  I'm hoping with Cassandra 1.0's addition of
> compression
> >> that
> >>>>> storing JSON is not too inefficient.
> >>>>>
> >>>>> Would there be interest in adding a JsonType?  I could look at
> putting
> >> a
> >>>>> patch together.
> >>>>>
> >>>>> Thanks,
> >>>>> Ben
> >>>>>
> >>>>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message