jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Esteban Franqueiro" <efran...@bea.com>
Subject RE: [jira] Resolved: (JCR-926) Global data store for binaries
Date Mon, 24 Sep 2007 15:44:47 GMT
Hi Thomas.
A few comments regarding the data store.
We're very interested in a database-backed data store, and we've been
working on it for a few weeks now. A couple of issues came up that made
us modify the original interface.
The first thing that came up was that the methods in the DataStore
interface throw IOException. This is not correct, since a database will
throw an SQLException. What we did was to create a new
DataStoreException, and then wrap the implementation-dependent
exceptions in it (since wrapping an SQLException in an IOException isn't
very pretty). We needed to change a few places in the code where an
IOException was expected also.
Another problem we found, is that the GC does a p.getStream().close() to
update the last modified time of the record. When the record is in a DB,
this causes a very long delay (we think it retrieves the data from the
store). So what we did was hacking some code to get the id of the record
and call back to the data store to update the record, as in:
	Object blob = ((PropertyImpl)
	if (blob instanceof BLOBInDataStore) {
		DataIdentifier id = ((BLOBInDataStore)
This works for the time being, but since InternalValue.internalValue()
is deprecated, a better way is needed :P
The issue of the delay when calling getStream()/getRecord() means that
the information provided by the record has to be stored in the record,
instead of relaying in the backing store (like it's done in the
FileDataRecord class).
The last issue I wan't to mention is that when you access a record and
read it's stream you can't close the connection, result set, and
statement used to access it, so a stream that closes its DB resources
when it's closed is needed. On the same track, a getRecord() call should
not retrieve the stream and keep the resources open unless explictly
asked (ie, by a getStream() call).

On a different but related topic, we're investigating how to connect the
GC with the DS. We have code to run the GC only once on repository
startup, and in a background thread. Both methods have it's pros and
cons of course. The run-once option is easier, but needs to grab a
global lock during it's run (we're using the RepositoryImpl.shutdownLock
for this) so that no session can be started while a collection is in
progress. The background thread instead needs some way to keep track of
changes in the binary properties. We didn't give this to much thought
yet because other things came up, but we'll get back to it soon.

We can contribute our code, but it's gonna take same time to extract it
(we moved it over a few JR versions). Also note that our code it's more
of a POC than production-ready for now.


PS: sorry for the lengthy mail

Esteban Franqueiro

Notice:  This email message, together with any attachments, may contain information  of  BEA
Systems,  Inc.,  its subsidiaries  and  affiliated entities,  that may be confidential,  proprietary,
 copyrighted  and/or legally privileged, and is intended solely for the use of the individual
or entity named in this message. If you are not the intended recipient, and have received
this message in error, please immediately return this by email and then delete it.

View raw message