jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Mueller <muel...@adobe.com>
Subject Re: Encoding of JCR values in json
Date Fri, 13 Apr 2012 06:00:46 GMT

>>* A JSON value is a double if
>> * A JSON value is a long if value.equals(Long.valueOf(value).toString())
>> * A JSON value is a decimal if it's a JSON number that matches neither
>> of the above two rules
>That's the approach I used in spi2microkernel. However, it has the
>drawback that we need to try catch through the cases when using the
>valueOf methods for determining the exact numeric type. Or we write our
>own code for parsing...

Java BigDecimal values can be very large. I'm not sure if, for example,
MongoDB has some kind of limitation on numbers (precision, length). Then,
how would you distinguish between BigDecimal 10, Double 10, and Long 10?
An alternative might be: numbers with an "e" are double ("10e0"), numbers
with a dot are decimal ("10.0"), all other numbers are long ("10"). But
that would require the MicroKernel stores the *exact* JSON representation.
I don't think that's a good idea either.

So I would prefer explicit typing, except for Long.

>>I'd be OK with us explicitly *not* supporting special cases like
>> infinities and NaN values.

I would prefer supporting them, using a well defined syntax (as a String).

>>Right. There's an additional constraint for binary values in that the
>> MicroKernel garbage collector needs some way to connect JSON
>> properties to referenced binaries. It would be useful if the same
>> convention was used also higher up the stack.
>Good point. Maybe Dominique can provide some insight here?

There are two garbage collectors in the MK: the "node data" GC and the
"data store" GC. I guess this is about the data store GC, which I wrote,
not Dominique.

Currently the data store GC is on a high level. Marking binaries that are
still in use is done using MicroKernel.getLength(String blobId). But data
store GC needs to traverse all nodes in all available revisions, so it is
really slow. It would be nice if binaries can be indexed, so that garbage
collection doesn't have to traverse *all* nodes in the repository (in all
revisions). So binary references should be easy to recognize.

>The way I solved this in spi2microkernel [1] is by encoding values by
> serializing them to their string representation (Value.getString()) and
> prepend its property type (value.getType) in radix 16 and a colon (:).

I think it's a good solution. It has been proven to be robust so far. Even
thought, for debugging purposes, I would prefer not to use hex digits. But
that's a minor issue really.

>>>On a related note: what kind of values do we want to expose from
>>> JSON like or JCR like?

I would use JCR like. The (oak-jcr to oak-core) remoting implementation
might use the same JSON conversion as used between oak-core and oak-mk,
but do we really need to define this now?


View raw message