streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Blackmon <>
Subject Integration, Documentation, Opportunities
Date Wed, 22 Jan 2014 07:34:54 GMT
Danny: thanks for trying it out.  It would be excellent if you would
test drive our examples and help with READMEs.  Good catch on the
manifest - I was launching the jar with -cp.  I see no reason we
couldn't formally open source our examples and possibly migrate them
into the streams repository if the list thinks that is a good idea and
will help develop them and add additional examples.  I've seen example
implementations maintained both inside and outside of platform

The next persistence modules we are preparing to contribute are hdfs
and elasticsearch, both useful and surprisingly tricky to get right
performance-wise.  A recursive link unwinder, boilerpipes article
extractor, and lucene tagger exist but need some work.

Not hard to come up with other modules that would be useful as part of
a real-time data flow and relatively straight-forward to write.  A
Rome-based RSS collector for example.  IRC listener.  OpenNLP?  Who
has other ideas or prototypes we could integrate?

-----Original Message-----
From: Danny Sullivan []
Sent: Tuesday, January 21, 2014 3:08 PM
Subject: RE: Substantial commit to new branch

Hey Steve,

Cool stuff! Let me know when a new place in svn is set up for the
examples you've written, I'd be happy to add running instructions to
the wiki for new developers. I needed to change the pom.xml for
twitter-sample-standalone to specify the main method to be
But I think it should work after that. Perhaps I can submit a pull
request for that.

Looking forward to integrating other platforms with Streams, Danny

> Date: Mon, 13 Jan 2014 09:57:13 -0600

> Subject: Re: Substantial commit to new branch

> From:

> To:


> On Fri, Jan 10, 2014 at 4:47 PM, Steve Blackmon <>wrote:


> > Greetings,

> >

> > Yesterday I completed a push of code we've been using to ingest data

> > streams from several major data providers, validate their messages,

> > and convert them to activitystreams format.



> Very cool.



> > There are some new top-level

> > modules, including

> >    a) streams-core - standard interfaces for the atomic units of

> > streams - providers, persisters, and processors

> >    b) streams-pojo - Jackson-compatible beans generated from

> > activitystreams json schemas

> >


> Tickets need to be created to remove the dependency on the Rave

> ActivityStreams implementation then.



> >    c) streams-contrib - a collection of implementation modules, two

> > or more of which can be imported into a new project and woven

> > together to create a customized performant data stream to execute

> > with java jar, storm jar, hadoop jar, yarn jar, etc...

> >    d) streams-config - a typesafe-based configuration scheme that

> > allows individual modules and coordinator code to pull the

> > configuration parameters they require or support from supplied

> > defaults, environment variables, run-time property files, command

> > line parameters, or accessible HTTP end-points.

> >

> > I'd love to see this project emerge as a code workspace where social

> > data vendors and consumers collaborate to ease the process of

> > integration, and facilitate data interchange with public data

> > schemas and protocols such as xml and json activitystreams formats.

> > No jvm-centric social data interoperability ecosystem exists today

> > to my knowledge.  Hopefully this code will become a valuable

> > starting point.  We have additional assets we will commit to

> > streams-contrib in the coming months as we get them cleaned up,

> > compliant with the streams-core interfaces, unit-tested, and real-world tested.

> >

> > I've also created a seperate external repository with some reference

> > data pipelines that demonstrate how to assemble various modules into

> > end-to-end streams at

> >

> > rjhp7f3HFLf6QQm66jhOYYCCrEgfSNlmI0kojGx8zauDYKrc9RgAhBfj-ndLuz_E9LZv


> > 2cFASO7bUojGhzlJrajz-GQb-7x0_W6QfcBU_dKc2XZPhOnEgfSNlmI3z2tk94pjQ_BMlisvRmDixpIxIWNXlJIj_w09JNdNAS2NF8Qg33p2AZxemDCy1tmkPh1eFEwxVwQg48X2NcTsT34c19HJncxO.
 Today it contains a working twitter gardenhose to activitystreams java process, and a storm-based
firehose processor that is still WIP.  More to come in this repo as well.

> >


> Do you intend to contribute this to Apache?  If so, we should setup a

> different area in SVN for it.



> >

> > Would love to get feedback on the concepts, patterns, and interfaces

> > proposed.  Will seek to merge with master in the standard 72 hours

> > unless anyone objects.

> >

> > Best,

> > Steve Blackmon

> >

View raw message