Return-Path: Delivered-To: apmail-jackrabbit-commits-archive@www.apache.org Received: (qmail 3044 invoked from network); 17 Jul 2009 15:22:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Jul 2009 15:22:10 -0000 Received: (qmail 85811 invoked by uid 500); 17 Jul 2009 15:23:16 -0000 Delivered-To: apmail-jackrabbit-commits-archive@jackrabbit.apache.org Received: (qmail 85738 invoked by uid 500); 17 Jul 2009 15:23:16 -0000 Mailing-List: contact commits-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list commits@jackrabbit.apache.org Received: (qmail 85729 invoked by uid 99); 17 Jul 2009 15:23:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jul 2009 15:23:16 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Jul 2009 15:23:12 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id D1DA11112E for ; Fri, 17 Jul 2009 15:22:51 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: commits@jackrabbit.apache.org Date: Fri, 17 Jul 2009 15:22:51 -0000 Message-ID: <20090717152251.16940.80128@eos.apache.org> Subject: [Jackrabbit Wiki] Update of "DataStore" by DarrenHartford X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification. The following page has been changed by DarrenHartford: http://wiki.apache.org/jackrabbit/DataStore ------------------------------------------------------------------------------ * Fulltext search and meta data extraction could be done when storing the object (only once per object) and stored next to the object. * Client should first send the checksum and size of large objects when they store something (import, adding or updating data), in many cases the actual data does not need to be sent. * Speed up garbage collection. One idea is to use 'back references' for larger objects: each larger object would know the set of nodes that reference it. This would be an 'append only' set, that means at runtime links only added, not removed. Only the garbage collection process removes links. The garbage collection would first update links for large objects (this process could stop at the first link that still exists). That way large objects can be removed quickly if they are not used any more. Afterwards, objects with a low use count should be scanned. This algorithm wouldn't necessarily speed up the total garbage collection time, but it would free up space more quickly. - * Compressed Datastore (file, db) - if the content type and size make it likely to have large disk space savings if compressed, set the datastore to auto-compress (whether zip, gzip, bz, etc.). File datastore is more likely to have this feature than DB. (user added 7/17/2009) + * Auto-Compressing Datastore (file, db) - if a specific file's content type and size make it likely to have large disk space savings if compressed, set the datastore to auto-compress (whether zip, gzip, bz, etc.). File datastore is more likely to have this feature than DB, and should not impact retrieval or other normal usage. (user added 7/17/2009)