community-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hervé BOUTEMY <herve.bout...@free.fr>
Subject Re: Standards for mail archive statistics gathering?
Date Thu, 07 May 2015 03:44:36 GMT
Le mercredi 6 mai 2015 12:48:34 Steve Blackmon a écrit :
> > For visualization, for sure, json is the current natural format when data
> > is consumed from the browser.
> > I don't have great experience on this, and what I'm missing with json
> > currently is a common practice on documenting a structure: are there
> > common
> > practices?
> 
> In podling streams [0], we make extensive use of json schema [1]
thank you: that's exactly the initial info I was looking for: json schema!

> from
> which we generate POJOs with a maven
> plugin jsonschema2pojo [2] which makes manipulating the objects in
> Java/Scala pleasant.  I expect other languages have
> similar jsonschema-based ORM paradigms as well.
As usual Java devloper, your tooling is interesting
But in the projects-new.a.o case, it is data extraction is coded in Python: if 
we create json schema, having Python classes generated could simplify coding.
Anyone with Python+json schema experience around?


> This pattern supports
> inheritance both within
> and across projects - for example see how [3] extends [4] which
> extends [5].  These schemas are relatively self documenting,
> but generating documentation or other artifacts is straight-forward as
> they are themselves json documents.
yeah, json schema document is easy to read (at least the examples on the 
site...)

> 
> > Because for simple json structure, documentation is not really necessary,
> > but once the structure goes complex, documentation is really a key
> > requirement for people to use or extend. And I already see this
> > shortcoming with the 11 json files from projects-new.a.o =
> > https://projects-new.apache.org/json/foundation/
> Having used these json documents a few weeks ago to build an apache
> community visualization [6]
yeah, really nice visualization!

> IMO the current crop of project-new jsons
> are intermediate artifacts rather than a sufficiently cross-purpose
> data model, a role currently held by DOAP mbox and misc others all
> with some inherent shortcomings most notably lack of navigability
> between silos.
+1
I'm at a point where I start to really understand the concepts involved and 
want to code a simple data model: I'll report here once I have a first version 
available.

> I'd like to nominate activity streams [7] with
> community-specific extensions (such as those roughly prototyped here:
> [8] ) as a potential core data model for this effort going forward
I had a first look at it: it is more complex than what I had in mind
We'll have to share and see what's the best bet

> and
> I'm happy to help apply some of the useful tools and connectors within
> podling streams toward that end. Converting external structured
> sources into normalized documents and indexing those activities to
> power data-centric APIs and visualizations are wheelhouse use cases
> for this project, as they say.
Great, stay tuned: I'll probably work on it this week-end

Regards,

Hervé

> 
> [0] http://streams.incubator.apache.org/
> [1] http://json-schema.org/documentation.html
> [2] http://www.jsonschema2pojo.org/
> [3]
> https://github.com/steveblackmon/streams-apache/blob/master/activities/src/
> main/jsonschema/objectTypes/committee.json [4]
> https://github.com/apache/incubator-streams/blob/master/streams-pojo/src/ma
> in/jsonschema/objectTypes/group.json [5]
> https://github.com/apache/incubator-streams/blob/master/streams-pojo/src/ma
> in/jsonschema/object.json [6] http://72.182.111.65:3000/workspace/3
> [7] http://activitystrea.ms/
> [8]
> https://github.com/steveblackmon/streams-apache/blob/master/activities/src/
> main/jsonschema
> 
> Steve Blackmon
> sblackmon@apache.org
> 
> On Wed, May 6, 2015 at 2:05 AM, Hervé BOUTEMY <herve.boutemy@free.fr> wrote:
> > Le mardi 5 mai 2015 21:26:36 Shane Curcuru a écrit :
> >> On 5/5/15 7:33 AM, Boris Baldassari wrote:
> >> > Hi Folks,
> >> > 
> >> > Sorry for the late answer on this thread. Don't know what has been done
> >> > since then, but I've some experience to share on this, so here are my
> >> > 2c..
> >> 
> >> No, more input is always appreciated!  Hervé is doing some
> >> centralization of the projects-new.a.o data capture, which is related
> >> but slightly separate.
> > 
> > +1
> > this can give a common place to put code once experiments show that we
> > should add a new data source
> > 
> >> But this is going to be a long-term project
> > 
> > +1
> > 
> >> with
> >> plenty of different people helping I bet.
> > 
> > I hope so...
> > 
> >> ...
> >> 
> >> > * Parsing mboxes for software repository data mining:
> >> > There is a suite of tools exactly targeted at this kind of duty on
> >> > github: Metrics Grimoire [1], developed (and used) by Bitergia [2]. I
> >> > don't know how they manage time zones, but the toolsuite is widely used
> >> > around (see [3] or [4] as examples) so I believe they are quite robust.
> >> > It includes tools for data retrieval as well as visualisation.
> >> 
> >> Drat.  Metrics Grimoire looks pretty nifty - essentially a set of
> >> frameworks for extracting metadata from a bunch of sources - but it's
> >> GPL, so personally I have no interest in working on it.  If someone else
> >> uses it to generate datasets that's great.
> >> 
> >> > * As for the feedback/thoughts about the architecture and formats:
> >> > I love the REST-API idea proposed by Rob. That's really easy to access
> >> > and retrieve through scripts on-demand. CSV and JSON are my favourite
> >> > formats, because they are, again, easy to parse and widely used --
> >> > every
> >> > language and library has some facility to read them natively.
> >> 
> >> Yup - again, like project visualization, to make any of this simple for
> >> newcomers to try stuff, we need to separate data gathering / model /
> >> visualization.  Since most of these are spare time projects, having easy
> >> chunks makes it simpler for different people to try their hand at it.
> > 
> > For visualization, for sure, json is the current natural format when data
> > is consumed from the browser.
> > I don't have great experience on this, and what I'm missing with json
> > currently is a common practice on documenting a structure: are there
> > common
> > practices?
> > Because for simple json structure, documentation is not really necessary,
> > but once the structure goes complex, documentation is really a key
> > requirement for people to use or extend. And I already see this
> > shortcoming with the 11 json files from projects-new.a.o =
> > https://projects-new.apache.org/json/foundation/
> > 
> > Regards,
> > 
> > Hervé
> > 
> >> Thanks,
> >> 
> >> - Shane


Mime
View raw message