jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: Encoding of JCR values in json
Date Thu, 12 Apr 2012 21:46:39 GMT


On 12.4.12 19:15, Jukka Zitting wrote:
>> While String and Boolean are straight forward double, long and decimal
>> are already more troublesome.
>
> As basic rules for handling the latter types I'd define something like this:
>
> * A JSON value is a double if value.equals(Double.valueOf(value).toString())
> * A JSON value is a long if value.equals(Long.valueOf(value).toString())
> * A JSON value is a decimal if it's a JSON number that matches neither
> of the above two rules

That's the approach I used in spi2microkernel. However, it has the 
drawback that we need to try catch through the cases when using the 
valueOf methods for determining the exact numeric type. Or we write our 
own code for parsing...

> I'd be OK with us explicitly *not* supporting special cases like
> infinities and NaN values. We'd just throw ValueFormatExceptions for
> them on the oak-jcr level and IllegalArgumentExceptions on the
> oak-core (or do something similar). Alternatively we should use
> explicit typing information with a well defined syntax for expressing
> such special cases.

Ack. Either way is fine with me.

>
>> Finally for binary, date, name, path, reference, weakreference and uri
>> there is no direct correspondence in JSON.
>
> Right. There's an additional constraint for binary values in that the
> MicroKernel garbage collector needs some way to connect JSON
> properties to referenced binaries. It would be useful if the same
> convention was used also higher up the stack.

Good point. Maybe Dominique can provide some insight here?


>> The way I solved this in spi2microkernel [1] is by encoding values by
>> serializing them to their string representation (Value.getString()) and
>> prepend its property type (value.getType) in radix 16 and a colon (:).
>
> Sounds like a workable solution, though I have some reservations:
>
> * The explicit encoding of numeric constants from JCR seems a bit
> troublesome and makes potential extensions more cumbersome.

What extensions come to mind?

>
> * The overloading of normal strings requires that all string values
> will need to be checked for whether they need to be escaped.

That's a small penalty:

   if(s.length() >= 2 && s.charAt(1) == ':') { ... }

>
> An alternative solution would be to use something like the @TypeHint
> feature used by the JSON functionality in Sling. Instead of "@", we
> should use something like "::" that's invalid in a JCR name to prevent
> conflicts. With such a solution the example JSON object would look
> like this:
>
>      "example":{
>        "long":123,
>        "another long":"124",
>        "another long::TypeHint":"long",
>        "double":"123.4",
>        "double::TypeHint":"double",
>        "string":"foo",
>        "another string":"a:string",
>        "another string::TypeHint":"string"
>      }
>
> That's a bit verbose, so we could also put the type hint directly into
> the relevant property name, like this:

The trouble with that is, that type info and value are spread across 
different properties. Setting a JCR property requires two JSON diff 
operations here. Worse for JCR observation: the corresponding set 
property entries might be spread across the journal.

>
>      "example":{
>        "long":123,
>        "another long::long":"124",
>        "double::double":"123.4",
>        "string":"foo",
>        "another string::string":"a:string"
>      }
>
> The main downsides of this approach are:
>
> * Name-based property accesses will potentially need to traverse
> through all properties to find a matching name. That should be
> manageable since the implementation can pre-scan all property names
> and split them to name and type parts.
>
> * There's a potential for conflicts like when a JSON object contains
> both "x" and "x::long" properties. That can be dealt with in a commit
> validator that prevents such objects from being persisted.

So there are three different approaches now: 1) encoding the type into 
the value, 2) encoding it into a separate property, or 3) encoding it 
into the name of the property.

I think 2) is most troublesome for the reasons outlined above. Regarding 
1) and 3) we should also think about consequences for query and 
indexing. Are there any drawbacks, advantages for either of those? Tom?

>
>> On a related note: what kind of values do we want to expose from oak-core?
>> JSON like or JCR like?
>
> I'd ideally like to keep it JSON-like so we can easily implement a
> JavaScript-friendly HTTP mapping directly based on the Oak API without
> having to go through extra levels of mapping.

Hmmm, seems reasonable. What about Angela's concerns?

Michael

>
>> Implementation wise, would that en/decoding happen inside oak-jcr or oak-core?
>
> I'd put the JSON-JCR type mapping into a shared helper class in
> oak-core since it'll be needed by a lot of things like query and node
> type handling inside oak-core. But the API interfaces should IMO be
> based on JSON types to support cases where JCR typing isn't needed or
> wanted.
>
> BR,
>
> Jukka Zitting

Mime
View raw message