jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Müller" <thomas.muel...@day.com>
Subject Re: Workspace.copy() Question ...
Date Wed, 12 Nov 2008 14:36:34 GMT

The problem is: "process the binary only once".

With 'process' we said 'text extraction', but it could be 'virus
scan', 'index', 'create a thumbnail', 'transfer' (to the client or
from the client), or 'backup' - any expensive task. I believe a good
solution is to provide the object identity to the module (the text
extraction engine, virus scanner, and so on), so that the module can
decide itself what to do.

Instead of returning an InputStream, Jackrabbit would return a
DataStoreInputStream with the additional method getDataIdentifier().
Then the module can read the identifier, check if the item is already
processed, and avoid reading the data itself if this identifier is
already processed. I believe that would be a flexible solution. How
the module stores the data for this object (the meta data) is module
specific. I don't think the best solution is to always store it in a
file or stream close to the binary. For text extraction, a separate
file may make sense, but probably not for 'virus scan' because that's
only a flag (you don't need the data). Thumbnails: for better
performance you want to keep them together, and not save them
separately (that is, in the data store).


View raw message