jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seidel. Robert" <Robert.Sei...@aeb.de>
Subject AW: AW: AW: AW: Incremental/deduplicating versioning
Date Wed, 13 Jul 2011 11:10:53 GMT
I think we are talking about different things.

You are talking about the DataStore save space by deduplicate data and reducing redundancy
in a general way. This can be done by the DataStore without additional information, I totally
agree with you here. The hash returned is for example also a way to reduce duplicates.

What I'm talking about is storing concrete diffs instead of whole files. This is not possible.
The DataStore is called from Jackrabbit (addRecord) with some stream and has absolutely no
idea what the original file was. So it can't determine the concrete diff. Sure it can look
for similar files (performance?) and make a diff (binaries?) to the most similar file found,
but that is maybe not the diff to the previous file from application view.

Regards, Robert

-----Urspr√ľngliche Nachricht-----
Von: Thomas Mueller [mailto:mueller@adobe.com] 
Gesendet: Mittwoch, 13. Juli 2011 11:37
An: users@jackrabbit.apache.org
Betreff: Re: AW: AW: AW: Incremental/deduplicating versioning


>The only possible way is to add the information (the DataStore does not
>have like the identifier/content of the previous version) to the stream
>(addRecord) from the application side.

I suggest to read http://en.wikipedia.org/wiki/Data_deduplication -
specially "Depending on the type of deduplication, redundant files may be
or even portions of files or other data that are similar can also be
removed." and http://en.wikipedia.org/wiki/Rsync#Algorithm


View raw message