Return-Path: X-Original-To: apmail-streams-dev-archive@minotaur.apache.org Delivered-To: apmail-streams-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 42423108A9 for ; Wed, 22 Jan 2014 07:35:32 +0000 (UTC) Received: (qmail 30478 invoked by uid 500); 22 Jan 2014 07:35:31 -0000 Delivered-To: apmail-streams-dev-archive@streams.apache.org Received: (qmail 30433 invoked by uid 500); 22 Jan 2014 07:35:26 -0000 Mailing-List: contact dev-help@streams.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@streams.incubator.apache.org Delivered-To: mailing list dev@streams.incubator.apache.org Received: (qmail 30412 invoked by uid 99); 22 Jan 2014 07:35:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jan 2014 07:35:24 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 22 Jan 2014 07:35:17 +0000 Received: (qmail 30379 invoked by uid 99); 22 Jan 2014 07:34:55 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jan 2014 07:34:55 +0000 Received: from localhost (HELO mail-vc0-f181.google.com) (127.0.0.1) (smtp-auth username sblackmon, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jan 2014 07:34:55 +0000 Received: by mail-vc0-f181.google.com with SMTP id ie18so11833vcb.12 for ; Tue, 21 Jan 2014 23:34:54 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=hP027HNU4TNE2h0jvhUGc+Dbmwg0gObUEQ0RXmOJf14=; b=MxLfajbLqEPNRnYLr2IHx0/muqhPC9FqCrmcGA0m2zB8XFtr/wXwK3UyjMqiB5n7HU YRsx/sCFDrU2SyWVaQYzgnyjazXh1pDq/6/zJZxC8RhtJnwacSf+eASaAlg9GaFulsc4 HCbvrZgHspI/hG5Sw304ptzyeN6aXq6wP19ISIYdER9f0o3s5UBB3BYNbsJ605AnAsii iDhQF6gZ61cHdcR2igQbXqBuTR/6O+SnWcfV7xSRGRFIcSf9MHhEbi0q/PEtqfK0jfyk ZomYXRqKwv9zxTlpepdbA1NUHbrKVJ9tEdJk7ABCIelFsNlZuvJ39DjEE+moKIExZaRP R8iA== X-Gm-Message-State: ALoCoQlbYOylN8uARXdKZbAuC6CbgLjIxLYZItbAk7d2SEBkKPf3Z66WViAn54/aM1EWWpfrsXkm MIME-Version: 1.0 X-Received: by 10.58.181.230 with SMTP id dz6mr137414vec.35.1390376094157; Tue, 21 Jan 2014 23:34:54 -0800 (PST) Received: by 10.220.107.75 with HTTP; Tue, 21 Jan 2014 23:34:54 -0800 (PST) X-Originating-IP: [64.30.184.117] Date: Wed, 22 Jan 2014 01:34:54 -0600 Message-ID: Subject: Integration, Documentation, Opportunities From: Steve Blackmon To: dev@streams.incubator.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Danny: thanks for trying it out. It would be excellent if you would test drive our examples and help with READMEs. Good catch on the manifest - I was launching the jar with -cp. I see no reason we couldn't formally open source our examples and possibly migrate them into the streams repository if the list thinks that is a good idea and will help develop them and add additional examples. I've seen example implementations maintained both inside and outside of platform repositories. The next persistence modules we are preparing to contribute are hdfs and elasticsearch, both useful and surprisingly tricky to get right performance-wise. A recursive link unwinder, boilerpipes article extractor, and lucene tagger exist but need some work. Not hard to come up with other modules that would be useful as part of a real-time data flow and relatively straight-forward to write. A Rome-based RSS collector for example. IRC listener. OpenNLP? Who has other ideas or prototypes we could integrate? -----Original Message----- From: Danny Sullivan [mailto:dsullivan7@hotmail.com] Sent: Tuesday, January 21, 2014 3:08 PM To: dev@streams.incubator.apache.org Subject: RE: Substantial commit to new branch Hey Steve, Cool stuff! Let me know when a new place in svn is set up for the examples you've written, I'd be happy to add running instructions to the wiki for new developers. I needed to change the pom.xml for twitter-sample-standalone to specify the main method to be org.apache.streams.twitter.example.TwitterSampleStandalone. But I think it should work after that. Perhaps I can submit a pull request for that. Looking forward to integrating other platforms with Streams, Danny > Date: Mon, 13 Jan 2014 09:57:13 -0600 > Subject: Re: Substantial commit to new branch > From: m.ben.franklin@gmail.com > To: dev@streams.incubator.apache.org > > On Fri, Jan 10, 2014 at 4:47 PM, Steve Blackmon wro= te: > > > Greetings, > > > > Yesterday I completed a push of code we've been using to ingest data > > streams from several major data providers, validate their messages, > > and convert them to activitystreams format. > > > Very cool. > > > > There are some new top-level > > modules, including > > a) streams-core - standard interfaces for the atomic units of > > streams - providers, persisters, and processors > > b) streams-pojo - Jackson-compatible beans generated from > > activitystreams json schemas > > > > Tickets need to be created to remove the dependency on the Rave > ActivityStreams implementation then. > > > > c) streams-contrib - a collection of implementation modules, two > > or more of which can be imported into a new project and woven > > together to create a customized performant data stream to execute > > with java jar, storm jar, hadoop jar, yarn jar, etc... > > d) streams-config - a typesafe-based configuration scheme that > > allows individual modules and coordinator code to pull the > > configuration parameters they require or support from supplied > > defaults, environment variables, run-time property files, command > > line parameters, or accessible HTTP end-points. > > > > I'd love to see this project emerge as a code workspace where social > > data vendors and consumers collaborate to ease the process of > > integration, and facilitate data interchange with public data > > schemas and protocols such as xml and json activitystreams formats. > > No jvm-centric social data interoperability ecosystem exists today > > to my knowledge. Hopefully this code will become a valuable > > starting point. We have additional assets we will commit to > > streams-contrib in the coming months as we get them cleaned up, > > compliant with the streams-core interfaces, unit-tested, and real-world= tested. > > > > I've also created a seperate external repository with some reference > > data pipelines that demonstrate how to assemble various modules into > > end-to-end streams at > > http://cp.mcafee.com/d/1jWVIp43qb9EVKOOY-OedTdFEIzDxRQTxNJd5x5Z5dB4s > > rjhp7f3HFLf6QQm66jhOYYCCrEgfSNlmI0kojGx8zauDYKrc9RgAhBfj-ndLuz_E9LZv > > ChMVxZYtRXBQQXFY-CYOUCPORQX8FGTKzOEuvkzaT0QSyrhdTVeZXTLuZXCXCOsVHkiP > > 2cFASO7bUojGhzlJrajz-GQb-7x0_W6QfcBU_dKc2XZPhOnEgfSNlmI3z2tk94pjQ_BMlis= vRmDixpIxIWNXlJIj_w09JNdNAS2NF8Qg33p2AZxemDCy1tmkPh1eFEwxVwQg48X2NcTsT34c19= HJncxO. Today it contains a working twitter gardenhose to activitystreams = java process, and a storm-based firehose processor that is still WIP. More= to come in this repo as well. > > > > Do you intend to contribute this to Apache? If so, we should setup a > different area in SVN for it. > > > > > > Would love to get feedback on the concepts, patterns, and interfaces > > proposed. Will seek to merge with master in the standard 72 hours > > unless anyone objects. > > > > Best, > > Steve Blackmon > >