jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: node naming
Date Tue, 08 Oct 2013 20:13:58 GMT

On Tue, Oct 8, 2013 at 11:45 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> And these arbitrary keys really require that two different normalization
> forms remain different?

I'm afraid they probably do.

While it's unlikely for unnormalized data to be used too frequently in
practice, someone could still easily craft a request or a piece of
content that could confuse code that doesn't expect the repository to
do auto-normalization. Another potentially troublesome example are
in-memory caches and other data structures that use paths as keys and
could thus be circumvented or potentially polluted with invalid data
if we relax path semantics. And yet another is the practice of
avoiding a too flat content hierarchy by distributing content across
subtrees based on the first few characters of a node name, which could
lead to lost, misplaced or duplicated content depending on how the
hierarchy is accessed.

> The use case are real-world users that mix platforms (Windows, Mac) and
> browsers (Webkit vs the rest) and end up with two nodes where there should
> be only one.
> And no, it would need to be done consistently (file upload through browser,
> WebDAV access, other HTTP based APIs, etc), and thus would be very hard to
> do all over the place.

Right, but it still would be doable on that level without potentially
compromising clients that use JCR directly. Combined with a
repository-level validation mechanism that rejects non-normalized
content (or content that after normalization would conflict with
existing content), we could still catch cases where such higher-level
processing hasn't been applied and prevent those from causing trouble.

> I wonder whether we could make normalization (or lack of it) depend on a mixin type?

Another potential solution might be to make such behavior
session-specific. An extra session attribute could be used to enable
auto-normalization just for that session. Clients that expect
filesystem semantics could use that option, while existing
database-oriented clients wouldn't have to worry about such things
(apart from the potential validation errors).


Jukka Zitting

View raw message