Return-Path: X-Original-To: apmail-cassandra-dev-archive@www.apache.org Delivered-To: apmail-cassandra-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3B41692BD for ; Thu, 29 Mar 2012 02:54:42 +0000 (UTC) Received: (qmail 12624 invoked by uid 500); 29 Mar 2012 02:54:41 -0000 Delivered-To: apmail-cassandra-dev-archive@cassandra.apache.org Received: (qmail 12280 invoked by uid 500); 29 Mar 2012 02:54:40 -0000 Mailing-List: contact dev-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list dev@cassandra.apache.org Received: (qmail 12255 invoked by uid 99); 29 Mar 2012 02:54:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2012 02:54:40 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tsaloranta@gmail.com designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Mar 2012 02:54:34 +0000 Received: by iazz13 with SMTP id z13so2716564iaz.31 for ; Wed, 28 Mar 2012 19:54:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=K/LEGHw6/nJwXdmrYo6QaLcphTTthMQleqRwbNGVyfg=; b=wv3K7uvPm/mxgXNykNfxvVanxB0eN2rctUNotioy+13d4ZSTyQNEzcTKXuzzkOu0nW Uvo3PevvvzuQaodI/ygu8rL59fqeOnUeMehAjVMGiUhyR4J9zK0cBYq/LWrCi/2pLS+F 4DzSqJ7vI5d1n6YtURjdFaAKR4jal+eUXsUBF4ZJt7lb4HDdGo3bnN9syySK2Jt+o5oB 4SdoZ7uXS1zbgpRJukjE48HkaiuqwWkKL/YxN+q8LCWbAzGjE84nRxibMsGHts9XMFFL 03XlvVIWtMo3MayGy3rMMoB4hsxHepZ+yGgEYJPNGpymz/neFA7cYYgDrAHMK7KdsB9m 7Wkg== MIME-Version: 1.0 Received: by 10.50.100.162 with SMTP id ez2mr278449igb.20.1332989653272; Wed, 28 Mar 2012 19:54:13 -0700 (PDT) Received: by 10.42.97.70 with HTTP; Wed, 28 Mar 2012 19:54:13 -0700 (PDT) In-Reply-To: <589876BA-E16B-4477-A329-5549AC5648F0@morningstar.com> References: <589876BA-E16B-4477-A329-5549AC5648F0@morningstar.com> Date: Wed, 28 Mar 2012 19:54:13 -0700 Message-ID: Subject: Re: Document storage From: Tatu Saloranta To: dev@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Wed, Mar 28, 2012 at 6:59 PM, Jeremiah Jordan wrote: > Sounds interesting to me. =A0I looked into adding protocol buffer support= at one point, and it didn't look like it would be too much work. =A0The tr= icky part was I also wanted to add indexing support for attributes of the i= nserted protocol buffers. =A0That looked a little trickier, but still not i= mpossible. =A0Though other stuff came up and I never got around to actually= writing any code. > JSON support would be nice, especially if you figured out how to get buil= t in indexing of the attributes inside the JSON to work =3D). Also, for whatever it's worth, it should be trivial to add support for Smile (binary JSON serialization): http://wiki.fasterxml.com/SmileFormatSpec since its logical data structure is pure JSON, no extensions or subsetting. The main Java impl is by Jackson project, but there is also a C codec (https://github.com/pierre/libsmile), and prototypes for PHP and Ruby bindings as well. But for all data it's bit faster, bit more compact; about 30% for individual items, but more (40 - 70%) for data sequences (due to optional back-referencing). JSON and Smile can be auto-detected from first 4 bytes or so, reliably and efficiently, so one should be able to add this either transparently or explicitly. One could even transcode things on the fly -- store as Smile, expose filtered results as JSON (and accept JSON or both). This could reduce storage cost while keep the benefits of flexible data format. -+ Tatu +-