jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francesco Mari (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (OAK-7846) Add a tool to export the tree pointed to by a node record
Date Wed, 17 Oct 2018 16:35:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16653807#comment-16653807

Francesco Mari commented on OAK-7846:

The tool receives as input one or more IDs of node records and produces a representation of
the subtrees rooted at those nodes.

I propose a line-based representation for the export that could be easily consumed in a streaming
fashion. For example:


* {{#}} is the beginning of a comment. The rest of the line will be ignored. This is useful
to add textual information that does not belong to the export but might still be useful for
debugging purposes.
* {{b}} and {{e}} mark the beginning and the end of an export, respectively. Given that the
tool might receive more than one node record ID as input, it might produce more than one export
in a single stream.
* {{p}} represents a property for the current node. For each property a {{NAME}} and a {{TYPE}}
are always provided.
* {{v}} represents a value for the current property. {{VALUE}} spans until the end of the
line. More than one {{v}} line can be produced for multi-value properties. If {{VALUE}} contains
a newline character, it has to be escaped to {{\n}}. It follows that any slash character will
need to be escaped too.
* {{c}} is the beginning of a child node named {{NAME}}. When such a line is processed, the
context of the following lines is supposed to be consumed in the context of this new node.
* {{u}} marks the end of the current node. When such a line is processed, the context will
switch back to the one of the parent node. Every {{c}} line has a corresponding {{u}} line.

The format is designed in such a way that it can be consumed by a finite state automata processing
one line at a time. This idea was heavily inspired by some work by [~ahanikel], which I hope
he will contribute soon!

There are some alternatives to this proposal:
* reuse the JSON export like the {{export}} command does. I don't like it because the produced
JSON is incorrect. Child node names and property names are conflated as keys in the same JSON
export. Moreover, property types are encoded as part of the property values, which makes the
import of such values non deterministic.
* use the CND export generated by the {{export}} command. That's simply not an adequate format.
* write the nodes directly into a Segment Store. An export is eventually going to be imported,
so why not importing it directly? I think that having a text-based format that you can zip
and send around is just too valuable to forgo, especially if the format is both lossless and
easy to parse. Processing the format defined above and write the forest into a repository
is quite trivial and should be the responsibility of yet another tool.

[~mduerig], [~dulceanu], [~ahanikel], what is your take on this?

> Add a tool to export the tree pointed to by a node record
> ---------------------------------------------------------
>                 Key: OAK-7846
>                 URL: https://issues.apache.org/jira/browse/OAK-7846
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: segment-tar
>    Affects Versions: 1.10
>            Reporter: Francesco Mari
>            Assignee: Francesco Mari
>            Priority: Major
> oak-segment-tar should have a tool that allows exporting a tree pointed to by a node
record. The tool must be written in a way that plays along with existing Oak tools (see OAK-7834)
and conventional UNIX ones.

This message was sent by Atlassian JIRA

View raw message