hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vuk Ercegovac <verc...@us.ibm.com>
Subject Re: Multi-language serialization discussion
Date Tue, 11 Nov 2008 08:02:09 GMT

>> We should also consider the JAQL work.
> Yes.  I've started to look at that more.  There examples imply a binary
> format for JSON, but I can find no details.
> Doug

The place to start for Jaql's JSON binary is:


An Item wraps a JSON value (arrays, objects-- called records in the code,
and atoms) for (de)serialization.

The way this comes together with Input/OutputFormats is as follows:
anything that can be read by an InputFormat is considered to be a JSON
array. The default is to assume a SequenceFileInputFormat where Item is the
type for value (consequently any JSON value). There are several ways to
override the default behavior so that other InputFormats can be used and
converted to JSON. More info can be found at

There is currently limited support in Jaql for schema; integrating it more
deeply is one of our top priorities.
We've developed a preliminary schema language:

and integrated it with the language for simple validation:

Since the use of schema is not deeply integrated, we are certainly open to
other schema languages such as
the current JSON Schema proposal.

As mentioned earlier in the thread, schema would be tremendously useful for
validataion as well as storage/runtime
efficiency. For Jaql, it can also be exploited for query optimization. The
current plan is to easily support validation (but not require it) when
reading in JSON. Following, we plan to look into storage and query
optimization opportunities. Deducing schemas sounds very interesting as

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message