Return-Path: X-Original-To: apmail-couchdb-dev-archive@www.apache.org Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1C3D97BB7 for ; Tue, 4 Oct 2011 19:34:06 +0000 (UTC) Received: (qmail 78837 invoked by uid 500); 4 Oct 2011 19:34:05 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 78802 invoked by uid 500); 4 Oct 2011 19:34:05 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 78794 invoked by uid 99); 4 Oct 2011 19:34:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2011 19:34:05 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.220.180 as permitted sender) Received: from [209.85.220.180] (HELO mail-vx0-f180.google.com) (209.85.220.180) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2011 19:34:01 +0000 Received: by vcbf11 with SMTP id f11so1100051vcb.11 for ; Tue, 04 Oct 2011 12:33:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; bh=nWaaovXz+ybz7gtTbgvHogZTOYHxr73BnJgnaMHEDc0=; b=duO5i38sBmgM+j9URe3VzBJDOw0qflAsnYybMU36jpAAioKhyHVIqyYBwMCD1/oTGU RwYVDazw3Vt4R+ZDCKGp2KlrikXALrknQjuEOgFuodb0XmoWVMfd3BrObcrHVburLwz7 4I8zs5VVGmy2AMlY8lwiBhsTMUnYtEV+rNg9s= Received: by 10.52.94.99 with SMTP id db3mr1552450vdb.448.1317756820044; Tue, 04 Oct 2011 12:33:40 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.73.134 with HTTP; Tue, 4 Oct 2011 12:33:00 -0700 (PDT) In-Reply-To: References: From: Paul Davis Date: Tue, 4 Oct 2011 14:33:00 -0500 Message-ID: Subject: Re: Universal Binary JSON in CouchDB To: dev@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable For a first step I'd prefer to see a patch that makes the HTTP responses choose a content type based on accept headers. Once we see what that looks like and how/if it changes performance then *maybe* we can start talking about on disk formats. Changing how we store things on disk is a fairly high impact change that we'll need to consider carefully. That said, the ubjson spec is starting to look reasonable and capable to be an alternative content-type produced by CouchDB. If someone were to write a patch I'd review it quite enthusiastically. On Tue, Oct 4, 2011 at 2:23 PM, Riyad Kalla wrote: > Hey Randall, > > This is something that Paul and I discussed on IRC. The way UBJ is writte= n > out looks something like this ([] blocks are just for readability): > [o][2] > =A0[s][4]name[s][3][bob] > =A0[s][3][age][i][31] > > Couch can easily prepend or append its own dynamic content in a reply. If= it > wants to prepend some information after the object header, the header wou= ld > need to be stored and manipulated by couch separately. > > For example, if I upload the doc above, Couch would want to take that roo= t > object header of: > [o][2] > > and change it to: > [o][4] > > before storing it because of the additions of _id and _rev. Actually this > could be as simple as storing a "rootObjectCount" and have couch dynamica= lly > generate the root every time. > > 'o' represents object containers with <=3D 254 elements (1 byte for lengt= h) > and 'O' represents object containers with up to 2.1 billion elements (4 b= yte > int). > > If couch did that any request coming into the server might look like this= : > <- client request > -- (server loads root object count) > -> server writes back object header: [o][4] > -- (server calculates dynamic data) > -> server writes back dynamic content > -> server streams raw record data straight off disk to client (no > deserialization) > -- (OPT: server calculates dynamic data) > --> OPT: server streams dynamic data appended > > Thoughts? > > Best, > Riyad > > P.S.> There is support in the spec for unbounded container types when cou= ch > doesn't know how much it is streaming back, but that isn't necessary for > retrieving stored docs (but could be handy when responding to view querie= s > and other requests whose length is not known in advance) > > On Tue, Oct 4, 2011 at 12:02 PM, Randall Leeds w= rote: > >> Hey, >> >> Thanks for this thread. >> >> I've been interested in ways to reduce the work from disk to client as >> well. >> Unfortunately, the metadata inside the document objects is variable base= d >> on >> query parameters (_attachments, _revisions, _revs_info...) so the server >> needs to decode the disk binary anyway. >> >> I would say this is something we should carefully consider for a 2.0 api= . I >> know that, for simplicity, many people really like having the underscore >> prefixed attributes mixed in right alongside the document data, but a >> future >> API that separated these could really make things fly. >> >> -Randall >> >> On Wed, Sep 28, 2011 at 22:25, Benoit Chesneau >> wrote: >> >> > On Thursday, September 29, 2011, Riyad Kalla wrote: >> > > DISCLAIMER: This looks long, but reads quickly (I hope). If you are = in >> a >> > > rush, >> > > just check the last 2 sections and see if it sounds interesting. >> > > >> > > >> > > Hi everybody. I am new to the list, but a big fan of Couch and I hav= e >> > been >> > > working >> > > on something I wanted to share with the group. >> > > >> > > My appologies if this isn't the right venue or list ediquette... I >> wasn't >> > > really >> > > sure where to start with this conversation. >> > > >> > > >> > > Background >> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> > > With the help of the JSON spec community I've been finalizing a >> > universal, >> > > binary JSON format specification that offers 1:1 compatibility with >> JSON. >> > > >> > > The full spec is here (http://ubjson.org/) and the quick list of typ= es >> > is >> > > here >> > > (http://ubjson.org/type-reference/). Differences with existing specs >> and >> > > "Why" are >> > > all addressed on the site in the first few sections. >> > > >> > > The goal of the specification was first to maintain 1:1 compatibilit= y >> > with >> > > JSON >> > > (no custom data structures - like what caused BSON to be rejected in >> > Issue >> > > #702), >> > > secondly to be as simple to work with as regular JSON (no complex da= ta >> > > structures or >> > > encoding/decoding algorithms to implement) and lastly, it had to be >> > smaller >> > > than >> > > compacted JSON and faster to generate and parse. >> > > >> > > Using a test doc that I see Filipe reference in a few of his issues >> > > (http://friendpaste.com/qdfyId8w1C5vkxROc5Thf) I get the following >> > > compression: >> > > >> > > * Compacted JSON: 3,861 bytes >> > > * Univ. Binary JSON: 3,056 bytes (20% smaller) >> > > >> > > In some other sample data (e.g. jvm-serializers sample data) I see a >> 27% >> > > compression >> > > with a typical compression range of 20-30%. >> > > >> > > While these compression levels are average, the data is written out = in >> an >> > > unmolested >> > > format that is optimized for read speed (no scanning for null >> > terminators) >> > > and criminally >> > > simple to work with. (win-win) >> > > >> > > I added more clarifying information about compression characteristic= s >> in >> > the >> > > "Size Requirements" >> > > section of the spec for anyone interested. >> > > >> > > >> > > Motivation >> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> > > I've been following the discussions surround a native binary JSON >> format >> > for >> > > the core >> > > CouchDB file (Issue #1092) which transformed into keeping the format >> and >> > > utilizing >> > > Google's Snappy (Issue #1120) to provide what looks to be roughly a >> > 40-50% >> > > reduction in file >> > > size at the cost of running the compression/decompression on every >> > > read/write. >> > > >> > > I realize in light of the HTTP transport and JSON encoding/decoding >> cycle >> > in >> > > CouchDB, the >> > > Snappy compression cycles are a very small part of the total time th= e >> > server >> > > spends working. >> > > >> > > I found this all interesting, but like I said, I realized up to this >> > point >> > > that Snappy >> > > wasn't any form of bottleneck and the big compression wins server si= de >> > were >> > > great so I had >> > > nothing to contribute to the conversation. >> > > >> > > >> > > Catalyst >> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> > > This past week I watched Tim Anglade's presentation ( >> http://goo.gl/LLucD >> > ) >> > > and started to >> > > foam at the mouth when I saw his slides where he skipped the JSON >> > > encode/decode cycle >> > > server-side and just generated straight from binary on disk into >> > MessagePack >> > > and got >> > > some phenomenal speedups from the server: >> > > http://i.imgscalr.com/XKqXiLusT.png >> > > >> > > I pinged Tim to see what the chances of adding Univ Binary JSON supp= ort >> > was >> > > and he seemed >> > > ameanable to the idea as long as I could hand him an Erlang or Ruby >> impl >> > > (unfortunately, >> > > I am not familiar with either). >> > > >> > > >> > > ah-HA! moment >> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> > > Today it occurred to me that if CouchDB were able to (at the cost of >> 20% >> > > more disk space >> > > than it is using with Snappy enabled, but still 20% *less* than befo= re >> > > Snappy was integrated) >> > > use the Universal Binary JSON format as its native storage format AN= D >> > > support for serving replies >> > > using the same format was added (a-la Tim's work), this would allow >> > CouchDB >> > > to (theoretically) >> > > reply to queries by pulling bytes off disk (or memory) and immediate= ly >> > > streaming them back to >> > > the caller with no intermediary step at all (no Snappy decompress, n= o >> > Erlang >> > > decode, no JSON encode). >> > > >> > > Given that the Univ Binary JSON spec is standard, easy to parse and >> > simple >> > > to convert back to >> > > JSON, adding support for it seemed more consistent with Couch's mott= o >> of >> > > ease and simplicity >> > > than say MessagePack or Protobuff which provide better compression b= ut >> at >> > > the cost of more >> > > complex formats and data types that have no ancillary in JSON. >> > > >> > > I don't know the intracacies of Couch's internals, if that is wrong = and >> > some >> > > Erlang >> > > manipulation of the data would still be required, I believe it would >> > still >> > > be faster to pull the data >> > > off disk in the Univ Binary JSON format, decode to Erlang native typ= es >> > and >> > > then reply while >> > > skipping the Snappy decompression step. >> > > >> > > If it *would* be possible to stream it back un-touched directly from >> > disk, >> > > that seems like >> > > an enhancement that could potentially speed up CouchDB by at least a= n >> > order >> > > of magnitude. >> > > >> > > >> > > Conclusion >> > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D >> > > I would appreciate any feedback on this idea from you guys with a lo= t >> > more >> > > knowledge of >> > > the internals. >> > > >> > > I have no problem if this is a horrible idea and never going to happ= en, >> I >> > > just wanted to try >> > > and contribute something back. >> > > >> > > >> > > Thank you all for reading. >> > > >> > > Best wishes, >> > > Riyad >> > > >> > >> > what is universal in something new? >> > >> > - =A0benoit >> > >> >