Return-Path: Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: (qmail 47525 invoked from network); 6 Sep 2007 10:26:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Sep 2007 10:26:56 -0000 Received: (qmail 1629 invoked by uid 500); 6 Sep 2007 10:26:50 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 1485 invoked by uid 500); 6 Sep 2007 10:26:49 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 1476 invoked by uid 99); 6 Sep 2007 10:26:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Sep 2007 03:26:49 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Sep 2007 10:28:13 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id E949A714201 for ; Thu, 6 Sep 2007 03:26:31 -0700 (PDT) Message-ID: <30503713.1189074391951.JavaMail.jira@brutus> Date: Thu, 6 Sep 2007 03:26:31 -0700 (PDT) From: "Thomas Mueller (JIRA)" To: dev@jackrabbit.apache.org Subject: [jira] Commented: (JCR-926) Global data store for binaries In-Reply-To: <10437621.1179313816120.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525395 ] Thomas Mueller commented on JCR-926: ------------------------------------ Revision 573209: Configuration is now supported. Still the system property 'org.jackrabbit.useDataStore' is required to enable this feature, but now the data store class (and for the FileDataStore, the path) can be configured: .... The DataStore API was changed a bit to support this. The DataStore configuration is optional, if missing the system almost works as now. Almost, because the BLOBValue class is no longer used. The system property org.jackrabbit.useDataStore will be removed when this is tested. Also, the system property org.jackrabbit.minBlobFileSize will be integrated into DataStore. My idea is that each data store implementation (file system, database, S3?) can have a different 'minimum size' depending on the overhead to store / load a value. By the way, the FileDataStore overhead (mainly calculating the SHA-1 digest) is quite low, smaller than 10%: Writing and reading 5 files 100 KB each, average over 5 runs: FileDataStore: 1390 ms, FileOutputStream: 1287 ms > Global data store for binaries > ------------------------------ > > Key: JCR-926 > URL: https://issues.apache.org/jira/browse/JCR-926 > Project: Jackrabbit > Issue Type: New Feature > Components: core > Reporter: Jukka Zitting > Attachments: dataStore.patch, DataStore.patch, DataStore2.patch, dataStore3.patch, dataStore4.zip, dataStore5-garbageCollector.patch, internalValue.patch, ReadWhileSaveTest.patch > > > There are three main problems with the way Jackrabbit currently handles large binary values: > 1) Persisting a large binary value blocks access to the persistence layer for extended amounts of time (see JCR-314) > 2) At least two copies of binary streams are made when saving them through the JCR API: one in the transient space, and one when persisting the value > 3) Versioining and copy operations on nodes or subtrees that contain large binary values can quickly end up consuming excessive amounts of storage space. > To solve these issues (and to get other nice benefits), I propose that we implement a global "data store" concept in the repository. A data store is an append-only set of binary values that uses short identifiers to identify and access the stored binary values. The data store would trivially fit the requirements of transient space and transaction handling due to the append-only nature. An explicit mark-and-sweep garbage collection process could be added to avoid concerns about storing garbage values. > See the recent NGP value record discussion, especially [1], for more background on this idea. > [1] http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/%3c510143ac0705120919k37d48dc1jc7474b23c9f02cbd@mail.gmail.com%3e -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.