jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Claus Köll (JIRA) <j...@apache.org>
Subject [jira] Commented: (JCR-926) Global data store for binaries
Date Fri, 31 Aug 2007 06:16:32 GMT

    [ https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12524012

Claus Köll commented on JCR-926:

ok if you have a use case as you described i think a global datastore is the best way to make
cross-workspace operations
more easy. you have only one file and no copies af same files.
if the road goes to a centralized repository a global datastore makes of course sense.

in my case (and i think other has also a similarly use case) a per workspace datastore makes
things easier
i am working for a government and the office employee get a lot of paper every day. they scan
it and put it into jackrabbit.
now we must keep the documents based on the law up to 5-7 years with fast read access in jackrabbit.
after that time we can archive it (slow access) and therefore
we want to store this documents not on a SAN storage (because its expensive) rather save it
to a cheaper storage system (tape drive system)
we have planed to make this with moving the data from one workspace (SAN) to a other one (tape
drive system)
with the global datastore is this not possible i think

how would you solve such scenarios ?

> Global data store for binaries
> ------------------------------
>                 Key: JCR-926
>                 URL: https://issues.apache.org/jira/browse/JCR-926
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: core
>            Reporter: Jukka Zitting
>         Attachments: dataStore.patch, DataStore.patch, DataStore2.patch, dataStore3.patch,
dataStore4.zip, dataStore5-garbageCollector.patch, internalValue.patch, ReadWhileSaveTest.patch
> There are three main problems with the way Jackrabbit currently handles large binary
> 1) Persisting a large binary value blocks access to the persistence layer for extended
amounts of time (see JCR-314)
> 2) At least two copies of binary streams are made when saving them through the JCR API:
one in the transient space, and one when persisting the value
> 3) Versioining and copy operations on nodes or subtrees that contain large binary values
can quickly end up consuming excessive amounts of storage space.
> To solve these issues (and to get other nice benefits), I propose that we implement a
global "data store" concept in the repository. A data store is an append-only set of binary
values that uses short identifiers to identify and access the stored binary values. The data
store would trivially fit the requirements of transient space and transaction handling due
to the append-only nature. An explicit mark-and-sweep garbage collection process could be
added to avoid concerns about storing garbage values.
> See the recent NGP value record discussion, especially [1], for more background on this
> [1] http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/%3c510143ac0705120919k37d48dc1jc7474b23c9f02cbd@mail.gmail.com%3e

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message