Return-Path: Delivered-To: apmail-apr-dev-archive@apr.apache.org Received: (qmail 89164 invoked by uid 500); 7 Jan 2003 18:21:38 -0000 Mailing-List: contact dev-help@apr.apache.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Delivered-To: mailing list dev@apr.apache.org Received: (qmail 89147 invoked from network); 7 Jan 2003 18:21:37 -0000 Date: Tue, 7 Jan 2003 11:37:52 -0600 (CST) Message-Id: <200301071737.h07Hbqi62730@newton.ch.collab.net> From: Karl Fogel To: dev@subversion.tigris.org Cc: dev@apr.apache.org Subject: serializeable md5 contexts Reply-To: kfogel@collab.net Emacs: anything free is worth what you paid for it. X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N The Problem: ============ Need a way to resume checksumming when appending to stored data. Otherwise, we'd have to recompute the md5 context for all the data already present, then continue with the same context as the new data comes in. If an MD5 digest could be reverted back to an unfinalized MD5 context, this would be easy, since the representation has the digest. But, of course, that's not possible. (This is for http://subversion.tigris.org/issues/show_bug.cgi?id=689. The reason is that there's no guarantee that any particular stream returned from svn_fs__rep_contents_write_stream() will be used for all the data from beginning to end.) Proposed Solution: ================== I'm thinking of a pair of functions: /** * Return a portable string representation of an MD5 context. * apr_md5_resume_context() can convert the string back to a context. * * @param context An MD5 context * @param pool The pool in which to allocate the returned string * @return The serialized form of the context * @note Call this with a context that has not yet been finalized * with apr_md5_finalize(). */ const char *apr_md5_serialize_context(struct apr_md5_ctx_t *context, apr_pool_t *pool); /** * Set an MD5 context to the state represented by a serialized context. * * @param context The MD5 context to serialize * @param serialized_context String obtained from apr_md5_serialize_context * @return The error APR_INVALID_MD5_SERIALIZATION if the * serialized representation cannot be parsed, else return success. */ apr_status_t *apr_md5_resume_context(struct apr_md5_ctx_t *context const char *serialized_context); Then we'd store the serialized context along with the digest. In other words, each time one writes data to a representation, one would: 1. Resume the rep's context if any, else init a new context. 2. Write the data through, calculating new checksum as we go. 3. Close the stream, reserialize the context, *then* finalize the context and compute a new digest, and store both the new serialization and digest in the rep. Does anyone see either a better solution, or an unexpected consequence/problem with this solution? Writing the serialization and deserialization isn't particularly hard, but I'd hate to do it and then discover there was a simpler answer :-). (By the way, I'm assuming that these would go into apr-util. But wouldn't have to, of course; they could live in Subversion's code if people don't think they belong in apr-util.) -K