streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Letourneau <jletournea...@gmail.com>
Subject Re: Integration, Documentation, Opportunities
Date Wed, 22 Jan 2014 15:42:29 GMT
What about Google docs activity publishing they just released?  That
would be useful in many ways...

On Wed, Jan 22, 2014 at 2:34 AM, Steve Blackmon <sblackmon@apache.org> wrote:
> Danny: thanks for trying it out.  It would be excellent if you would
> test drive our examples and help with READMEs.  Good catch on the
> manifest - I was launching the jar with -cp.  I see no reason we
> couldn't formally open source our examples and possibly migrate them
> into the streams repository if the list thinks that is a good idea and
> will help develop them and add additional examples.  I've seen example
> implementations maintained both inside and outside of platform
> repositories.
>
>
> The next persistence modules we are preparing to contribute are hdfs
> and elasticsearch, both useful and surprisingly tricky to get right
> performance-wise.  A recursive link unwinder, boilerpipes article
> extractor, and lucene tagger exist but need some work.
>
>
> Not hard to come up with other modules that would be useful as part of
> a real-time data flow and relatively straight-forward to write.  A
> Rome-based RSS collector for example.  IRC listener.  OpenNLP?  Who
> has other ideas or prototypes we could integrate?
>
>
> -----Original Message-----
> From: Danny Sullivan [mailto:dsullivan7@hotmail.com]
> Sent: Tuesday, January 21, 2014 3:08 PM
> To: dev@streams.incubator.apache.org
> Subject: RE: Substantial commit to new branch
>
>
> Hey Steve,
>
> Cool stuff! Let me know when a new place in svn is set up for the
> examples you've written, I'd be happy to add running instructions to
> the wiki for new developers. I needed to change the pom.xml for
> twitter-sample-standalone to specify the main method to be
> <mainClass>org.apache.streams.twitter.example.TwitterSampleStandalone</mainClass>.
> But I think it should work after that. Perhaps I can submit a pull
> request for that.
>
> Looking forward to integrating other platforms with Streams, Danny
>
>> Date: Mon, 13 Jan 2014 09:57:13 -0600
>
>> Subject: Re: Substantial commit to new branch
>
>> From: m.ben.franklin@gmail.com
>
>> To: dev@streams.incubator.apache.org
>
>>
>
>> On Fri, Jan 10, 2014 at 4:47 PM, Steve Blackmon <sblackmon@apache.org>wrote:
>
>>
>
>> > Greetings,
>
>> >
>
>> > Yesterday I completed a push of code we've been using to ingest data
>
>> > streams from several major data providers, validate their messages,
>
>> > and convert them to activitystreams format.
>
>>
>
>>
>
>> Very cool.
>
>>
>
>>
>
>> > There are some new top-level
>
>> > modules, including
>
>> >    a) streams-core - standard interfaces for the atomic units of
>
>> > streams - providers, persisters, and processors
>
>> >    b) streams-pojo - Jackson-compatible beans generated from
>
>> > activitystreams json schemas
>
>> >
>
>>
>
>> Tickets need to be created to remove the dependency on the Rave
>
>> ActivityStreams implementation then.
>
>>
>
>>
>
>> >    c) streams-contrib - a collection of implementation modules, two
>
>> > or more of which can be imported into a new project and woven
>
>> > together to create a customized performant data stream to execute
>
>> > with java jar, storm jar, hadoop jar, yarn jar, etc...
>
>> >    d) streams-config - a typesafe-based configuration scheme that
>
>> > allows individual modules and coordinator code to pull the
>
>> > configuration parameters they require or support from supplied
>
>> > defaults, environment variables, run-time property files, command
>
>> > line parameters, or accessible HTTP end-points.
>
>> >
>
>> > I'd love to see this project emerge as a code workspace where social
>
>> > data vendors and consumers collaborate to ease the process of
>
>> > integration, and facilitate data interchange with public data
>
>> > schemas and protocols such as xml and json activitystreams formats.
>
>> > No jvm-centric social data interoperability ecosystem exists today
>
>> > to my knowledge.  Hopefully this code will become a valuable
>
>> > starting point.  We have additional assets we will commit to
>
>> > streams-contrib in the coming months as we get them cleaned up,
>
>> > compliant with the streams-core interfaces, unit-tested, and real-world tested.
>
>> >
>
>> > I've also created a seperate external repository with some reference
>
>> > data pipelines that demonstrate how to assemble various modules into
>
>> > end-to-end streams at
>
>> > http://cp.mcafee.com/d/1jWVIp43qb9EVKOOY-OedTdFEIzDxRQTxNJd5x5Z5dB4s
>
>> > rjhp7f3HFLf6QQm66jhOYYCCrEgfSNlmI0kojGx8zauDYKrc9RgAhBfj-ndLuz_E9LZv
>
>> > ChMVxZYtRXBQQXFY-CYOUCPORQX8FGTKzOEuvkzaT0QSyrhdTVeZXTLuZXCXCOsVHkiP
>
>> > 2cFASO7bUojGhzlJrajz-GQb-7x0_W6QfcBU_dKc2XZPhOnEgfSNlmI3z2tk94pjQ_BMlisvRmDixpIxIWNXlJIj_w09JNdNAS2NF8Qg33p2AZxemDCy1tmkPh1eFEwxVwQg48X2NcTsT34c19HJncxO.
 Today it contains a working twitter gardenhose to activitystreams java process, and a storm-based
firehose processor that is still WIP.  More to come in this repo as well.
>
>> >
>
>>
>
>> Do you intend to contribute this to Apache?  If so, we should setup a
>
>> different area in SVN for it.
>
>>
>
>>
>
>> >
>
>> > Would love to get feedback on the concepts, patterns, and interfaces
>
>> > proposed.  Will seek to merge with master in the standard 72 hours
>
>> > unless anyone objects.
>
>> >
>
>> > Best,
>
>> > Steve Blackmon
>
>> >

Mime
View raw message