jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Mueller (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-926) Global data store for binaries
Date Thu, 30 Aug 2007 10:28:31 GMT

    [ https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523814
] 

Thomas Mueller commented on JCR-926:
------------------------------------

Hi,

> great work !
Most of it was Jukkas work.

> configure the datastore
That's the next priority. Currently, you need to use system properties.

> each workspace can have its own datastore
No, there is only one data store per repository. Technically it would be possible to store
the files in the different workspaces, but in my view a lot of the benefit would be lost.

> different backup solutions
Backup (online and incremental) is important. It is very easy to backup the data store: just
copy all files. They are never modified, and only renamed from temp file to live file. Deleted
only when no longer used (by the garbage collector). But I know only a few use cases: 'backup
everything', 'backup incrementally', 'backup with data compression', 'backup with encryption',
and 'restore'. What other backup use cases / solutions do you have in mind?

> Global data store for binaries
> ------------------------------
>
>                 Key: JCR-926
>                 URL: https://issues.apache.org/jira/browse/JCR-926
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: core
>            Reporter: Jukka Zitting
>         Attachments: dataStore.patch, DataStore.patch, DataStore2.patch, dataStore3.patch,
dataStore4.zip, dataStore5-garbageCollector.patch, internalValue.patch, ReadWhileSaveTest.patch
>
>
> There are three main problems with the way Jackrabbit currently handles large binary
values:
> 1) Persisting a large binary value blocks access to the persistence layer for extended
amounts of time (see JCR-314)
> 2) At least two copies of binary streams are made when saving them through the JCR API:
one in the transient space, and one when persisting the value
> 3) Versioining and copy operations on nodes or subtrees that contain large binary values
can quickly end up consuming excessive amounts of storage space.
> To solve these issues (and to get other nice benefits), I propose that we implement a
global "data store" concept in the repository. A data store is an append-only set of binary
values that uses short identifiers to identify and access the stored binary values. The data
store would trivially fit the requirements of transient space and transaction handling due
to the append-only nature. An explicit mark-and-sweep garbage collection process could be
added to avoid concerns about storing garbage values.
> See the recent NGP value record discussion, especially [1], for more background on this
idea.
> [1] http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/%3c510143ac0705120919k37d48dc1jc7474b23c9f02cbd@mail.gmail.com%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message