Return-Path: X-Original-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 475FEEB9B for ; Tue, 20 Nov 2012 16:25:23 +0000 (UTC) Received: (qmail 72023 invoked by uid 500); 20 Nov 2012 16:25:23 -0000 Delivered-To: apmail-jackrabbit-oak-dev-archive@jackrabbit.apache.org Received: (qmail 71895 invoked by uid 500); 20 Nov 2012 16:25:21 -0000 Mailing-List: contact oak-dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: oak-dev@jackrabbit.apache.org Delivered-To: mailing list oak-dev@jackrabbit.apache.org Received: (qmail 71861 invoked by uid 99); 20 Nov 2012 16:25:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Nov 2012 16:25:21 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jukka.zitting@gmail.com designates 209.85.212.42 as permitted sender) Received: from [209.85.212.42] (HELO mail-vb0-f42.google.com) (209.85.212.42) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Nov 2012 16:25:14 +0000 Received: by mail-vb0-f42.google.com with SMTP id fs19so4803727vbb.1 for ; Tue, 20 Nov 2012 08:24:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to:content-type; bh=8xGF7l7+iG/h3fH31uBY0/jNE4e1EutIf72xuLJ75Cc=; b=bSPQgCLtnuY5LfVurWlJX/eZNvyyqVvvPUN+fY8/A1042tRQwMCuPhJTvGYlBacchY PoqRt5dUxzZVqhpDGdcIj+8Yf6Fq3CUQWz2vcrjN9jY2qUriPXxafZ5upykqMBo5f2Vo YFYIuF8GYRMt6UAUIu1GXS8Elw9LNjGCUDZ1rYmTmbP7D9jLIwHghs1W70fFX89GbHCQ 9SFCkeWktQcaMc4pNyW6WXoiFT9Dv2fnnEeiphgFniQ4CuJ2RYLsjO68K+Z/kHB3BpCu Qw//wO8OV8mRDcOaWQswp2JH8YC7H8h8uI3d8m3t/loTE4+8GF37Kc6Hh/yZG2x1PYWZ ggkA== Received: by 10.52.27.138 with SMTP id t10mr189844vdg.81.1353428693681; Tue, 20 Nov 2012 08:24:53 -0800 (PST) MIME-Version: 1.0 Received: by 10.58.132.226 with HTTP; Tue, 20 Nov 2012 08:24:32 -0800 (PST) From: Jukka Zitting Date: Tue, 20 Nov 2012 18:24:32 +0200 Message-ID: Subject: Identifier- or hash-based access in the MicroKernel To: Oak devs Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hi, A lot of functionality in Oak (node states, the diff and hook mechanisms, etc.) are based on walking down the tree hierarchy one level at a time. To do this, for example to access changes below /a/b/c, oak-core will currently request paths /a, /a/b, /a/b/c and so on from the underlying MK implementation. This would work reasonably well with MK implementations that are essentially big hash table that map the full path (and revision) to the content at that location. Even then there's some space overhead as even tiny nodes (think of an ACL entry) get paired with the full path (and revision) of the node. The current MongoMK with its path keys works like this, though even there a secondary index is needed for the path lookups. The approach is less ideal for MK implementations (like the default H2-based one) that have to traverse the path when some content is accessed. For example, with the above oak-core access pattern, the sequence of accessed nodes would be [ a, a, b, a, b, c ], where ideally just [ a, b, c ] would suffice. The KernelNodeStore cache in oak-core prevents this from being too big an issue, but ideally we'd be able to avoid such extra levels of caching. To solve that mismatch without impacting the overall architecture too much I'd like to propose the following: * When requested using the filter argument, the getNodes() call may (but is not required to) return special ":hash" or ":id" properties as parts of the (possibly otherwise empty) child node objects included in the JSON response. * When returned by getNodes(), those values can be used by the client instead of the normal path argument when requesting the content of such child nodes using other getNodes() calls. The MK implementation is expected to automatically detect whether a given string argument is a path, a hash or an identifier, possibly as simply as looking at whether it starts with a slash. * Both ":hash" and ":id" values are expected to uniquely identify a specific immutable state of a node. The only difference is that the inequality of two hashes implies the inequality of the referenced nodes (which can be used by oak-core to optimize some operations), whereas it's possible for two different ids to refer to nodes with the exact same content. Such a solution would allow the following sequence getNodes("/") => { "a": {} } getNodes("/a") => { "b": {} } getNodes("/a/b") => { "c": {} } getNodes("/a/b/c") => {} to become something like getNodes("/") => { "a": { ":id": "x" } } getNodes("x") => { "b": { :id": "y" } } getNodes("y") => { "c": { :id": "z"} } getNodes("z") => {} with x, y and z being some implementation-specific identifiers, like ObjectIDs in MongoDB. In any case the MK implementation would still be required to support access by full path. BR, Jukka Zitting