drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Apache Drill
Date Mon, 19 Oct 2015 00:52:30 GMT

On Sun, Oct 18, 2015 at 11:37 AM, Julian Hyde <jhyde@apache.org> wrote:

> ...
> My proposed “solution” — and I suspect you’re not going to like it — is to
> ignore, for now, harder XML problems and focus on the easier ones.

Hmm.... I think that this may or may not be easy. But it is real important.

> A lot of XML documents do not have repeating scalar values. They are
> collections of records, perhaps with nested records or nested collections
> of records.

The scalar-ness of my example was just a simplification. The same problem
occurs every time there is a list that sometimes contains 1 element.

> Whitespace can be safely thrown away. Namespaces are not used.


> A lot of data is in XML format because XML was the only option considered,
> not because the data structure pushed the limits of what XML’s rich model
> can express.


> I think 90% of cases can be handled using a simple XML-to-JSON mapper that
> takes hints such as that the “employee” tag is to become a list of JSON
> maps and the “salary” and “name” tags are to be treated as attributes.


The real question is whether or not the XML community already has such a
hinting mechanism.  Or is Drill about to reinvent that?

> I really think that if we focus on the harder cases we’ll end up with the
> wrong solution.

No doubt.  This isn't one of those.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message