Hey Ryan,
All of the projects mentioned in that thread are for serializing / deserializing JSON to/from
case classes that you’ve already built by hand, or for accessing JSON directly without spec’ing
out case classes at all.
I’m proposing a maven plugin that inspects all of the jsonschemas in a module and whatever
schemas they extend, and generates traits and case classes into which JSON can be loaded /
unloaded. These classes would be natively compatible with spark sql, play, and other frameworks
that are optimized for operating on instances of case classes.
Also, we’d able to generate org.apache.streams.scala.json as a complement to the existing
org.apache.streams.pojo.json off the activity streams POJOs and use them to work with activity
streams data in those framework - without the compute/memory overhead and code ugliness of
constantly converting between scala primitives/arrays/maps, and java primitives/arrays/maps.
If you run across any Apache licensed libraries out there that tackle these problems, I’d
love to have a look at them.
Steve Blackmon
sblackmon@apache.org
On Mon, Apr 25, 2016 at 11:29 AM Ryan Ebanks
<
mailto:Ryan Ebanks <ryanebanks@gmail.com>
> wrote:
I think being able to generate case classes from json schema is valuable.
However there are already projects that attempt to do this. See this stack
overflow question/answer.
http://stackoverflow.com/questions/23531065/scala-parse-json-directly-into-a-case-class
What will streams do that will be better/different than these projects?
On Thu, Apr 21, 2016 at 12:13 PM, Steve Blackmon <
mailto:sblackmon@apache.org
>
wrote:
> tl;dr We should build a suite of maven-plugins to generate new categories
> of source and resource artifacts. for starters we need our own jsonschema
> to java pojo plugin
>
> For a while I’ve been working on stories to add the ability to generate
> new types of sources and resources from jsonschemas, including the activity
> streams schemas maintained by the project.
>
>
> 1. [image: New Feature] STREAMS-389
> Support generation of scala source from jsonschemas
> <
https://issues.apache.org/jira/browse/STREAMS-389
>
>
>
> 1. [image: New Feature] STREAMS-398
> Support generation of hive table definitions from jsonschema
> <
https://issues.apache.org/jira/browse/STREAMS-398
>
>
>
> I've gotten pretty deep into this and believe strongly at this point that
> diversifying the type of artifacts our project can generate off schemas
> will add a powerful and valuable set of use cases. There’s a lot of
> working being done in spark and flink to enable, simplify, and optimize
> working with data when quality POJOs and scala case classes are available
> on the class path.
>
> There are a series of other popular big data technologies where having an
> explicit definition of object structure makes working with data easier
> (hadoop, pig, elasticsearch, kafka, just to name a few). Making it simple
> to generate those artifacts using CLIs or maven plugins off in-house
> schemas, mixing in schemas from streams providers and processors, or linked
> externally on the web could be the killer app streams has been missing.
>
> To really pursue this it makes sense that we would build up core utilities
> for resolving and managing the object types defined and referenced across
> groups of schemas and external dependencies. To date we've relied entirely
> on org.jsonschema:jsonschema2pojo and
> org:jsonschema:jsonschema2pojo-maven-plugin to handle this conversion of
> schemas to POJOs. I think we need to bring that core capability in-house
> to have full control of it’s behavior and output.
>
> Questions for the list:
> Does this challenge resonate with you / your organization?
> Do you have any concern about shifting project attention toward plugins
> and tools for data definition?
> Are you comfortable / uncomfortable with seeing the core streams POJOs
> used throughout our providers and processors change as part of this effort?
>
> Steve Blackmon
>
mailto:sblackmon@apache.org
>
|