Return-Path: X-Original-To: apmail-cassandra-dev-archive@www.apache.org Delivered-To: apmail-cassandra-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 92B519D1F for ; Thu, 29 Mar 2012 07:05:02 +0000 (UTC) Received: (qmail 33738 invoked by uid 500); 29 Mar 2012 07:05:01 -0000 Delivered-To: apmail-cassandra-dev-archive@cassandra.apache.org Received: (qmail 33700 invoked by uid 500); 29 Mar 2012 07:05:01 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 33587 invoked by uid 99); 29 Mar 2012 07:05:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2012 07:05:00 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of benjamin.j.mccann@gmail.com designates 209.85.215.172 as permitted sender) Received: from [209.85.215.172] (HELO mail-ey0-f172.google.com) (209.85.215.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2012 07:04:56 +0000 Received: by eaaq11 with SMTP id q11so797571eaa.31 for ; Thu, 29 Mar 2012 00:04:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=iRN3ElP1PsWwO4+m+SdayL3dt2JKsGxwqKK+0WoXK1o=; b=Ska/ALdA4PUotVM+FPb1UGAoc33cZ5AhUPQvJgIrRo7yWHeYiVOgcnhSq1ElT8XEbd w2/HH7NmFbmuSWr4TxUbvH43U5eVLlPwyGGnkRyDLeDoY2w3i7mN7oZ3dgK4H06NIimq 3vNuP4g9OB80bZ3QpbVw5UDX8M9N3lODuodjKCpZIXlbIZm35Y6muLUCFo7a/E3OsoNP VoxgUId3Ido4tTw6jZERf33dgdYQM2fZ3ourCdPjARXmnRp5gidzVAfWvIaVOgbk98vO IngxzK3R1GPN7+Up98ijOL1Om00wxwXy7O9mH3YDn4oVFHJ7Ul5cmWuG0MgkVFDDSkpi rAPA== MIME-Version: 1.0 Received: by 10.180.107.101 with SMTP id hb5mr2722248wib.7.1333004674794; Thu, 29 Mar 2012 00:04:34 -0700 (PDT) Sender: benjamin.j.mccann@gmail.com Received: by 10.216.230.95 with HTTP; Thu, 29 Mar 2012 00:04:34 -0700 (PDT) In-Reply-To: <2348050A-D905-444E-9DA3-346914613E03@venarc.com> References: <1FFFD2A1-45A9-4231-92D2-DE89177213AE@gmail.com> <2348050A-D905-444E-9DA3-346914613E03@venarc.com> Date: Thu, 29 Mar 2012 00:04:34 -0700 X-Google-Sender-Auth: hx_aloh7HP1Tbf9CAallSnmSRm8 Message-ID: Subject: Re: Document storage From: Ben McCann To: dev@cassandra.apache.org Content-Type: multipart/alternative; boundary=e89a8f3bb0cf20039b04bc5c5687 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f3bb0cf20039b04bc5c5687 Content-Type: text/plain; charset=ISO-8859-1 Sounds awesome Drew. Mind sharing your custom type? I just wrote a basic JSON type and did the validation the same way you did, but I don't have any SMILE support yet. It seems that if your type were committed to the Cassandra codebase then the issue you ran into of the CLI only supporting built-in types would no longer be a problem for you (though fixing the issue anyway would be good and I voted for it). Btw, any reason you compress it with Snappy yourself instead of just setting sstable_compression to SnappyCompressorand letting Cassandra do that part? -Ben On Wed, Mar 28, 2012 at 11:28 PM, Drew Kutcharian wrote: > I'm actually doing something almost the same. I serialize my objects into > byte[] using Jackson's SMILE format, then compress it using Snappy then > store the byte[] in Cassandra. I actually created a simple Cassandra Type > for this but I hit a wall with cassandra-cli: > > https://issues.apache.org/jira/browse/CASSANDRA-4081 > > Please vote on the JIRA if you are interested. > > Validation is pretty simple, you just need to read the value and parse it > using Jackson, if you don't get any exceptions you're JSON/Smile is valid ;) > > -- Drew > > > > On Mar 28, 2012, at 9:28 PM, Ben McCann wrote: > > > I don't imagine sort is a meaningful operation on JSON data. As long as > > the sorting is consistent I would think that should be sufficient. > > > > > > On Wed, Mar 28, 2012 at 8:51 PM, Edward Capriolo >wrote: > > > >> Some work I did stores JSON blobs in columns. The question on JSON > >> type is how to sort it. > >> > >> On Wed, Mar 28, 2012 at 7:35 PM, Jeremy Hanna > >> wrote: > >>> I don't speak for the project, but you might give it a day or two for > >> people to respond and/or perhaps create a jira ticket. Seems like > that's a > >> reasonable data type that would get some traction - a json type. > However, > >> what would validation look like? That's one of the main reasons there > are > >> the data types and validators, in order to validate on insert. > >>> > >>> On Mar 29, 2012, at 12:27 AM, Ben McCann wrote: > >>> > >>>> Any thoughts? I'd like to submit a patch, but only if it will be > >> accepted. > >>>> > >>>> Thanks, > >>>> Ben > >>>> > >>>> > >>>> On Wed, Mar 28, 2012 at 8:58 AM, Ben McCann > wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> I was wondering if it would be interesting to add some type of > >>>>> document-oriented data type. > >>>>> > >>>>> I've found it somewhat awkward to store document-oriented data in > >>>>> Cassandra today. I can make a JSON/Protobuf/Thrift, serialize it, > and > >>>>> store it, but Cassandra cannot differentiate it from any other string > >> or > >>>>> byte array. However, if my column validation_class could be a > JsonType > >>>>> that would allow tools to potentially do more interesting > >> introspection on > >>>>> the column value. E.g. bug 3647< > >> https://issues.apache.org/jira/browse/CASSANDRA-3647>calls for > supporting > >> arbitrarily nested "documents" in CQL. Running a > >>>>> query against the JSON column in Pig is possible as well, but again > in > >> this > >>>>> use case it would be helpful to be able to encode in column metadata > >> that > >>>>> the column is stored as JSON. For debugging, running nightly > reports, > >> etc. > >>>>> it would be quite useful compared to the opaque string and byte array > >> types > >>>>> we have today. JSON is appealing because it would be easy to > >> implement. > >>>>> Something like Thrift or Protocol Buffers would actually be > interesting > >>>>> since they would be more space efficient. However, they would also > be > >> a > >>>>> bit more difficult to implement because of the extra typing > information > >>>>> they provide. I'm hoping with Cassandra 1.0's addition of > compression > >> that > >>>>> storing JSON is not too inefficient. > >>>>> > >>>>> Would there be interest in adding a JsonType? I could look at > putting > >> a > >>>>> patch together. > >>>>> > >>>>> Thanks, > >>>>> Ben > >>>>> > >>>>> > >>> > >> > > --e89a8f3bb0cf20039b04bc5c5687--