jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: Encoding of JCR values in json
Date Fri, 13 Apr 2012 09:53:25 GMT


On 13.4.12 7:00, Thomas Mueller wrote:
> Hi,
>
>>> * A JSON value is a double if
>>> value.equals(Double.valueOf(value).toString())
>>> * A JSON value is a long if value.equals(Long.valueOf(value).toString())
>>> * A JSON value is a decimal if it's a JSON number that matches neither
>>> of the above two rules
>>
>> That's the approach I used in spi2microkernel. However, it has the
>> drawback that we need to try catch through the cases when using the
>> valueOf methods for determining the exact numeric type. Or we write our
>> own code for parsing...
>
> Java BigDecimal values can be very large. I'm not sure if, for example,
> MongoDB has some kind of limitation on numbers (precision, length). Then,
> how would you distinguish between BigDecimal 10, Double 10, and Long 10?
> An alternative might be: numbers with an "e" are double ("10e0"), numbers
> with a dot are decimal ("10.0"), all other numbers are long ("10"). But
> that would require the MicroKernel stores the *exact* JSON representation.
> I don't think that's a good idea either.

The underlying storage mechanism must not be of concern to the user of 
the Microkernel API. The Microkernel API uses JSON to transport values. 
So the Microkernel implementation is in charge of serializing these into 
a format suitable for the underlying storage mechanism. Even if I put a 
number with 10 Million digits into a JSON property.

If we want/need to make restrictions here, we will have to clarify and 
document these. See OAK-11.

>
> So I would prefer explicit typing, except for Long.
>
>>> I'd be OK with us explicitly *not* supporting special cases like
>>> infinities and NaN values.
>
> I would prefer supporting them, using a well defined syntax (as a String).
>
>>> Right. There's an additional constraint for binary values in that the
>>> MicroKernel garbage collector needs some way to connect JSON
>>> properties to referenced binaries. It would be useful if the same
>>> convention was used also higher up the stack.
>>
>> Good point. Maybe Dominique can provide some insight here?
>
> There are two garbage collectors in the MK: the "node data" GC and the
> "data store" GC. I guess this is about the data store GC, which I wrote,
> not Dominique.
>
> Currently the data store GC is on a high level. Marking binaries that are
> still in use is done using MicroKernel.getLength(String blobId). But data
> store GC needs to traverse all nodes in all available revisions, so it is
> really slow. It would be nice if binaries can be indexed, so that garbage
> collection doesn't have to traverse *all* nodes in the repository (in all
> revisions). So binary references should be easy to recognize.

Not sure whether I understand. How could the GC possible know whether a 
binary is still in use or not? I could do

String blobId = mk.write(inStream);

and write the returned blobId on a piece of paper. According to the 
current Microkernel contract I could come back after a couple of years 
and would still be able to retrieve that blob.

Michael

>
>> The way I solved this in spi2microkernel [1] is by encoding values by
>> serializing them to their string representation (Value.getString()) and
>> prepend its property type (value.getType) in radix 16 and a colon (:).
>
> I think it's a good solution. It has been proven to be robust so far. Even
> thought, for debugging purposes, I would prefer not to use hex digits. But
> that's a minor issue really.
>
>>>> On a related note: what kind of values do we want to expose from
>>>> oak-core?
>>>> JSON like or JCR like?
>
> I would use JCR like. The (oak-jcr to oak-core) remoting implementation
> might use the same JSON conversion as used between oak-core and oak-mk,
> but do we really need to define this now?
>
> Regards,
> Thomas
>

Mime
View raw message