streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Blackmon" <>
Subject Re: Source and Resource generation from jsonschemas
Date Mon, 25 Apr 2016 16:51:41 GMT
Hey Ryan,

All of the projects mentioned in that thread are for serializing / deserializing JSON to/from
case classes that you’ve already built by hand, or for accessing JSON directly without spec’ing
out case classes at all. 

I’m proposing a maven plugin that inspects all of the jsonschemas in a module and whatever
schemas they extend, and generates traits and case classes into which JSON can be loaded /
unloaded.  These classes would be natively compatible with spark sql, play, and other frameworks
that are optimized for operating on instances of case classes.

Also, we’d able to generate org.apache.streams.scala.json as a complement to the existing
org.apache.streams.pojo.json off the activity streams POJOs and use them to work with activity
streams data in those framework - without the compute/memory overhead and code ugliness of
constantly converting between scala primitives/arrays/maps, and java primitives/arrays/maps.

If you run across any Apache licensed libraries out there that tackle these problems, I’d
love to have a look at them.

Steve Blackmon

On Mon, Apr 25, 2016 at 11:29 AM Ryan Ebanks

mailto:Ryan Ebanks <>
> wrote:

I think being able to generate case classes from json schema is valuable.

However there are already projects that attempt to do this. See this stack

overflow question/answer.
What will streams do that will be better/different than these projects?

On Thu, Apr 21, 2016 at 12:13 PM, Steve Blackmon <


> tl;dr We should build a suite of maven-plugins to generate new categories

> of source and resource artifacts. for starters we need our own jsonschema

> to java pojo plugin


> For a while I’ve been working on stories to add the ability to generate

> new types of sources and resources from jsonschemas, including the activity

> streams schemas maintained by the project.



> 1. [image: New Feature] STREAMS-389

> Support generation of scala source from jsonschemas

> <



> 1. [image: New Feature] STREAMS-398

> Support generation of hive table definitions from jsonschema

> <



> I've gotten pretty deep into this and believe strongly at this point that

> diversifying the type of artifacts our project can generate off schemas

> will add a powerful and valuable set of use cases. There’s a lot of

> working being done in spark and flink to enable, simplify, and optimize

> working with data when quality POJOs and scala case classes are available

> on the class path.


> There are a series of other popular big data technologies where having an

> explicit definition of object structure makes working with data easier

> (hadoop, pig, elasticsearch, kafka, just to name a few). Making it simple

> to generate those artifacts using CLIs or maven plugins off in-house

> schemas, mixing in schemas from streams providers and processors, or linked

> externally on the web could be the killer app streams has been missing.


> To really pursue this it makes sense that we would build up core utilities

> for resolving and managing the object types defined and referenced across

> groups of schemas and external dependencies. To date we've relied entirely

> on org.jsonschema:jsonschema2pojo and

> org:jsonschema:jsonschema2pojo-maven-plugin to handle this conversion of

> schemas to POJOs. I think we need to bring that core capability in-house

> to have full control of it’s behavior and output.


> Questions for the list:

> Does this challenge resonate with you / your organization?

> Do you have any concern about shifting project attention toward plugins

> and tools for data definition?

> Are you comfortable / uncomfortable with seeing the core streams POJOs

> used throughout our providers and processors change as part of this effort?


> Steve Blackmon

  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message