incubator-lucy-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Metadata encoding
Date Tue, 24 Mar 2009 11:50:06 GMT
Starting with JSON seems like a reasonable choice.


Marvin Humphrey <> wrote:
> Greets,
> Lucy indexes will contain significant metadata, which should be written in a
> human-readable format for easy spelunking and debugging.  There are probably
> four main contenders for choice of encoding: JSON, YAML, XML, and a custom
> format.
> If we go with a custom format, IMO it should be an extension of JSON.  Our
> needs will not be limited to simple key-value pairs, and designing our own
> full-featured data-description language would be foolish.  Let's try to avoid
> custom formats until we decide that there's no other choice.
> XML and YAML are certainly sophisticated enough to handle our data needs.
> However, they both require large, heavyweight parsers, and I think we should
> try to avoid imposing such a dependency on future Lucy C users.
> Furthermore, XML is less well-matched to the scalar-list-mapping data
> structures common to the dynamic languages that Lucy targets than either YAML
> or JSON.
> YAML offers the advantage of extensible data types.  That's become more
> appealing as I've tried to figure out how to serialize entire schemas in JSON,
> including Analyzer and Similarity specifications.  However, the YAML spec is
> very large.  If we decide that we need YAML's features, I think we ought to
> try to limit ourselves to a subset of the spec.
> Still, it would be for the best if we could avoid that kind of complexity, and
> go with the simplest human-readable option that supports scalar-list-mapping
> data structures: JSON.
> This excerpt from the YAML 1.2 draft spec points a way forward:
>    1.4. Relation to JSON
>    Both JSON and YAML aim to be human readable data interchange formats.
>    However, JSON and YAML have different priorities. JSON’s foremost design
>    goal is simplicity and universality. Thus, JSON is trivial to generate and
>    parse, at the cost of reduced human readability. It also uses a lowest
>    common denominator information model, ensuring any JSON data can be easily
>    processed by every modern programming environment.
>    In contrast, YAML’s foremost design goals are human readability and
>    support for serializing arbitrary native data structures. Thus, YAML
>    allows for extremely readable files, but is more complex to generate and
>    parse. In addition, YAML ventures beyond the lowest common denominator
>    data types, requiring more complex processing when crossing between
>    different programming environments.
>    YAML can therefore be viewed as a natural superset of JSON, offering
>    improved human readability and a more complete information model. This is
>    also the case in practice; every JSON file is also a valid YAML file. This
>    makes it easy to migrate from JSON to YAML if/when the additional features
>    are required.
>    It may be useful to define a intermediate format between YAML and JSON.
>    Such a format would be trivial to parse (but not very human readable),
>    like JSON. At the same time, it would allow for serializing arbitrary
>    native data structures, like YAML. Such a format might also serve as
>    YAML’s "canonical format".
>    Defining such a "YSON" format (YSON is a Serialized Object Notation) can
>    be done either by enhancing the JSON specification or by restricting the
>    YAML specification. Such a definition is beyond the scope of this
>    specification.
> (Note that YAML version 1.2 is not well supported yet; most parsers support
> 1.0 or 1.1.)
> I'm sure we can hammer all the data we need into JSON; it's just a matter of
> at what point it becomes so inelegant that wandering outside the JSON spec
> into YAML becomes the best solution.  That's not a threshold we should cross
> lightly, so for now I advocate that we try to work within JSON's constraints.
> Marvin Humphrey

View raw message