streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Franklin <m.ben.frank...@gmail.com>
Subject Re: [DISCUSS] Continuing the Momentum
Date Fri, 18 Apr 2014 23:18:28 GMT
On Fri, Apr 18, 2014 at 5:11 PM, Danny Sullivan <dsullivan7@hotmail.com>wrote:

> "If streams could collect activity data (whatever format), store
> it, aggregate it and provide analytics on that data as a package I think
> you've won."
>

Add "query" and I agree.


>
>
> +1
>
> I think Chris and I had the same idea at the same time
>
> > Date: Fri, 18 Apr 2014 14:04:37 -0700
> > Subject: Re: [DISCUSS] Continuing the Momentum
> > From: chris@cxtsoftware.com
> > To: dev@streams.incubator.apache.org
> >
> > Steve, while I agree with what you are saying, I still caution you to
> limit
> > the scope of the streams project. There is a big difference between
> > creating a tool and creating a solution. Streams has the potential to be
> a
> > solution for ingesting, aggregating and analyzing activity data (not
> > limited to activitystre.ms data). If you make it all about the platform
> all
> > you have is a tool other developers can use to build solutions. I think
> > there is value in having Streams be a solution (or at least partial
> > solution). If streams could collect activity data (whatever format),
> store
> > it, aggregate it and provide analytics on that data as a package I think
> > you've won. There could be another activity in the future to pull out
> some
> > of the infrastructure code and make another project that was a generic
> > processing platform that streams happened to use.
> >
> > One other note is I think if storm is a requirement you are going to
> limit
> > your customer base as well.
> >
> > I will say though that I'm just providing my opinion. I'm not a committer
> > or PMC member so I don't even really have a vote but as an outside
> > observer, and someone whose seen projects succeed/fail, these are a few
> > thoughts.
> >
> > Chris
> >
> > On Thu, Apr 17, 2014 at 8:27 PM, Steve Blackmon <steve@blackmon.org>
> wrote:
> >
> > > Chris, I think you are right that the group should focus our efforts,
> > > and that online activities (broadly defined) are the sweet spot.  I
> > > just wouldn't want to give potential users or contributors the idea
> > > that Streams is just for ActivityStreams - which I at least associate
> > > with small data sets.  At least they look small viewed through Jira,
> > > Jive, and similar tools.  Streams is also a big data processing engine
> > > which can take advantage of the best features of storm or yarn while
> > > significantly reducing the learning curve and code complexity of those
> > > frameworks.
> >
> >
> > > So long as the website makes it clear that activity data is a concept
> > > and Streams can work regardless of how the data and metadata are
> > > shaped, I'm cool with "Real-time Processing for Activity Data Streams"
> > > as a tagline.
> > >
> > > Steve
> > >
> > > On Thu, Apr 17, 2014 at 8:04 PM, Chris Geer <chris@cxtsoftware.com>
> wrote:
> > > > On Thu, Apr 17, 2014 at 9:32 AM, Steve Blackmon <steve@blackmon.org>
> > > wrote:
> > > >
> > > >> >> Target audience is our potential users.  Technical in nature,
> but it
> > > >> still
> > > >> >> needs to be succinct.
> > > >> >>
> > > >> >
> > > >> > Ok, with that said, I think the tag-line should be more feature
> > > focused
> > > >> > because that can hook both the tech guys and business guys.
> > > >>
> > > >> Agreed
> > > >>
> > > >> > We also need to make careful just using the term "streams" because
> > > >> really this isn't a
> > > >> > generic stream processor (aka storm), our focus is on Activity
> > > Streams.
> > > >> > Maybe activity streams is a bad descriptor as well and Activity
> Data
> > > >> might
> > > >> > be better. "Real-time Processing for Activity Data Streams"???
> > > >> >
> > > >>
> > > >> The engine actually doesn't care whether documents being processed
> are
> > > >> activity-related or not:
> > > >> any JVM object that jackson can serialize and deserialize work just
> > > >> fine as datums.
> > > >>
> > > >> I think we can acknowledge that the community has a bias toward
> > > >> ActivityStreams, but we shouldn't
> > > >> downplay the flexibility Streams provides.  Focusing only on
> activity
> > > >> data in project messaging
> > > >> undercuts the fact that Streams is a powerful, flexible ESB/ETL
> > > >> replacement.
> > > >>
> > > >
> > > > My 2-cents for what it's worth. If we don't focus on a niche this
> won't
> > > > take off. ESB/ETL systems are a dime-a-dozen and to be really good in
> > > that
> > > > space is a big endeavor. I'm not saying this system couldn't fill
> some of
> > > > those needs but I think it's a bad idea to be that broad.
> > > >
> > > >>
> > > >> >>
> > > >> >>
> > > >> >> >
> > > >> >> > >
> > > >> >> > > ?
> > > >> >> > >
> > > >> >> > > On Thu, Apr 17, 2014 at 8:26 AM, Matt Franklin
<
> > > >> >> m.ben.franklin@gmail.com
> > > >> >> > >
> > > >> >> > > wrote:
> > > >> >> > > > On Mon, Apr 14, 2014 at 5:22 PM, Renato MarroquĂ­n
> Mogrovejo <
> > > >> >> > > > renatoj.marroquin@gmail.com> wrote:
> > > >> >> > > >
> > > >> >> > > >> Hi devs,
> > > >> >> > > >>
> > > >> >> > > >> Yeah the title was indeed compelling.
You got me on that
> one
> > > lol
> > > >> >> > > >> I think that you guys are right saying
that for
> attracting new
> > > >> >> people
> > > >> >> > > maybe
> > > >> >> > > >> we should try making the project's goal
something more
> > > >> applicable in
> > > >> >> > > real
> > > >> >> > > >> life than just being "a Lightweight server
for
> > > ActivityStreams".
> > > >> >> > > >> I liked the simple explanation I heard,maybe
it was the
> pisco
> > > but
> > > >> >> > please
> > > >> >> > > >> correct me if I am wrong, "it's an abstraction
layer for
> > > stream
> > > >> >> > > processing
> > > >> >> > > >> engines". IMHO we have two things defined:
> > > >> >> > > >>
> > > >> >> > > >> MISION:
> > > >> >> > > >> 1)  A flexible data processing framework
that can run in
> > > multiple
> > > >> >> > > different
> > > >> >> > > >> runtimes.  The goal being to abstract
platform complexity
> and
> > > >> allow
> > > >> >> > for
> > > >> >> > > >> business logic reuse across real-time,
enterprise, web and
> > > >> >> stand-alone
> > > >> >> > > >> executions.
> > > >> >> > > >>
> > > >> >> > > >> This is what needs to be done.
> > > >> >> > > >
> > > >> >> > > >
> > > >> >> > > >> VISION:
> > > >> >> > > >> 2)  As a proving ground for the adoption
of data format
> > > >> standards,
> > > >> >> > > >> specifically ActivityStreams to start.
 The community
> would
> > > work
> > > >> to
> > > >> >> > > drive
> > > >> >> > > >> the adoption and evolution of such standards
through
> > > real-world
> > > >> >> > > experience.
> > > >> >> > > >>
> > > >> >> > > >> This is where we would like to get at
some time. But also
> to
> > > get
> > > >> >> more
> > > >> >> > > >> community engaged, things have to simple.
That is a big
> issue
> > > we
> > > >> >> still
> > > >> >> > > have
> > > >> >> > > >> over in Gora, and we are trying to solve
it through talks,
> > > better
> > > >> >> > > >> tutorials, integration with other projects,
and so forth.
> > > >> >> > > >> Just my 2cents guys.
> > > >> >> > > >>
> > > >> >> > > >
> > > >> >> > > > So what is the tag line that sums up both
the mission and
> the
> > > >> vision?
> > > >> >> > > >
> > > >> >> > > >
> > > >> >> > > >>
> > > >> >> > > >>
> > > >> >> > > >> Renato M.
> > > >> >> > > >>
> > > >> >> > > >>
> > > >> >> > > >> 2014-04-14 16:31 GMT+02:00 Matt Franklin
<
> > > >> m.ben.franklin@gmail.com
> > > >> >> >:
> > > >> >> > > >>
> > > >> >> > > >> > On Fri, Apr 11, 2014 at 5:01 PM,
Steve Blackmon <
> > > >> >> > sblackmon@apache.org
> > > >> >> > > >> > >wrote:
> > > >> >> > > >> >
> > > >> >> > > >> > > On Thu, Apr 10, 2014 at 4:11
PM, Matt Franklin <
> > > >> >> > > >> m.ben.franklin@gmail.com
> > > >> >> > > >> > >
> > > >> >> > > >> > > wrote:
> > > >> >> > > >> > > > tl;dr version:
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > We need to discuss things
on the list more and work
> to
> > > >> define
> > > >> >> > > >> streams,
> > > >> >> > > >> > > > update our public presence
to support this
> definition
> > > and
> > > >> >> > > encourage
> > > >> >> > > >> > > > additional engagement.
> > > >> >> > > >> > > >
> > > >> >> > > >> > > +1, +1, +1
> > > >> >> > > >> > >
> > > >> >> > > >> > > > Long version:
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > For those of you unaware,
Steve Blackmon gave a nice
> > > talk
> > > >> on
> > > >> >> the
> > > >> >> > > work
> > > >> >> > > >> > he
> > > >> >> > > >> > > > has been committing to
Streams at ApacheCon.  As
> part of
> > > >> that
> > > >> >> > talk
> > > >> >> > > >> and
> > > >> >> > > >> > > > follow on discussions,
it became clear that we as a
> > > >> community
> > > >> >> > > need to
> > > >> >> > > >> > do
> > > >> >> > > >> > > > some serious work to define
ourselves, what we are
> > > building
> > > >> >> and
> > > >> >> > > why
> > > >> >> > > >> it
> > > >> >> > > >> > is
> > > >> >> > > >> > > > valuable to the industry.
> > > >> >> > > >> > > >
> > > >> >> > > >> > > If anyone who missed the presentation
wants to see
> it, I'm
> > > >> happy
> > > >> >> > to
> > > >> >> > > >> > > host a google hangout to run
through it.
> > > >> >> > > >> > >
> > > >> >> > > >> >
> > > >> >> > > >> > Can you post it, or a link to it,
on the website too?
> > > >> >> > > >> >
> > > >> >> > > >> >
> > > >> >> > > >> > >
> > > >> >> > > >> > > > Our website says we are
a Lightweight server for
> > > >> >> > ActivityStreams.
> > > >> >> > > >> >  While
> > > >> >> > > >> > > > this is true to some degree,
I think recent
> > > contributions
> > > >> >> should
> > > >> >> > > >> refine
> > > >> >> > > >> > > > this.  The new code is
really about supporting
> flexible
> > > >> >> > > processing,
> > > >> >> > > >> > > > persistence and retrieval
of data in multiple
> runtimes
> > > >> using
> > > >> >> > > strongly
> > > >> >> > > >> > > > typed, normalized data
formats like ActivityStreams.
> > > >> >> >  Personally,
> > > >> >> > > I
> > > >> >> > > >> > think
> > > >> >> > > >> > > > this slightly new direction
is extremely
> compelling, and
> > > >> the
> > > >> >> > > reaction
> > > >> >> > > >> > to
> > > >> >> > > >> > > > Steve's talk seems to support
that.  The question
> > > remains
> > > >> how
> > > >> >> > does
> > > >> >> > > >> the
> > > >> >> > > >> > > > community as a whole see
the project?  What value is
> > > >> everyone
> > > >> >> > > wanting
> > > >> >> > > >> > to
> > > >> >> > > >> > > > get out of this effort?
> > > >> >> > > >> > > >
> > > >> >> > > >> > > The session tag-line which attracted
~20 attendees was
> > > >> >> > 'Simplifying
> > > >> >> > > >> > > Real-Time data integration with
Apache Streams.' From
> > > >> talking to
> > > >> >> > > >> > > coders and data scientists I
always hear frustration
> with
> > > how
> > > >> >> much
> > > >> >> > > >> > > time they spend writing code
and workflow to move
> bytes
> > > >> around
> > > >> >> and
> > > >> >> > > >> > > keep track of their data assets.
I'd wager any survey
> of
> > > >> >> prominent
> > > >> >> > > >> > > open-source libraries and popular
commercial APIs
> would
> > > have
> > > >> to
> > > >> >> > > >> > > conclude that schema and interface
standards are
> > > completely
> > > >> >> absent
> > > >> >> > > >> > > or sparsely adopted at many
layers.
> > > >> >> > > >> > >
> > > >> >> > > >> > > Standards in hardware, operating
systems, networks,
> and
> > > >> >> relational
> > > >> >> > > >> > > databases brought about flourishing
ecosystems. I
> believe
> > > >> >> > standards
> > > >> >> > > in
> > > >> >> > > >> > > data interchange such as ActivityStreams
can do the
> same
> > > for
> > > >> the
> > > >> >> > > >> > > social web, but not everyone
will embrace standards
> for
> > > the
> > > >> sake
> > > >> >> > of
> > > >> >> > > >> > > standards. If we can offer integration
points to the
> data
> > > >> >> sources
> > > >> >> > > and
> > > >> >> > > >> > > repositories businesses want
to work with, and
> demonstrate
> > > >> that
> > > >> >> > > >> > > Streams can handle 'fire-hose'
scale data volumes with
> > > >> >> arbitrarily
> > > >> >> > > >> > > many intermediate hand-offs
and processing steps on
> > > messages
> > > >> in
> > > >> >> > > >> > > flight, I think we will see
adoption from enterprises
> > > >> looking to
> > > >> >> > > >> > > replace ESB-type systems that
can't keep up with the
> > > volume
> > > >> of
> > > >> >> > data
> > > >> >> > > >> > > generated (both inside and outside
their networks)
> that
> > > they
> > > >> >> want
> > > >> >> > to
> > > >> >> > > >> > > track.  Streams is pretty decent
at ETL as well - a
> > > function
> > > >> >> that
> > > >> >> > is
> > > >> >> > > >> > > never going away, even as the
underlying tools best
> > > suited to
> > > >> >> > > >> > > performing it at scale constantly
change.
> > > >> >> > > >> > >
> > > >> >> > > >> > > This future-state I'm attempting
to describe will be a
> > > better
> > > >> >> one
> > > >> >> > > for
> > > >> >> > > >> > > researchers, hobbyists, entrepreneurs,
and consumers
> of
> > > web
> > > >> >> > products
> > > >> >> > > >> > > and services.  Configuration-driven,
runtime-platform
> > > >> agnostic,
> > > >> >> > > >> > > software for real-time data
exchange:  where
> > > community-driven
> > > >> >> > > >> > > standards such as Activity Streams
can codify and
> evolve
> > > >> >> > > >> > > best-practices via running code.
 That is a vision
> that I
> > > >> think
> > > >> >> > will
> > > >> >> > > >> > > help us generate significant
traction going forward.
> > > >> >> > > >> > >
> > > >> >> > > >> >
> > > >> >> > > >> > Just to make sure I am understanding
you correctly, you
> are
> > > >> >> > proposing
> > > >> >> > > we
> > > >> >> > > >> > update the mission of the project
to the following:
> > > >> >> > > >> >
> > > >> >> > > >> > 1)  A flexible data processing framework
that can run in
> > > >> multiple
> > > >> >> > > >> different
> > > >> >> > > >> > runtimes.  The goal being to abstract
platform
> complexity
> > > and
> > > >> >> allow
> > > >> >> > > for
> > > >> >> > > >> > business logic reuse across real-time,
enterprise, web
> and
> > > >> >> > stand-alone
> > > >> >> > > >> > executions.
> > > >> >> > > >> > 2)  As a proving ground for the adoption
of data format
> > > >> standards,
> > > >> >> > > >> > specifically ActivityStreams to start.
 The community
> would
> > > >> work
> > > >> >> to
> > > >> >> > > drive
> > > >> >> > > >> > the adoption and evolution of such
standards through
> > > real-world
> > > >> >> > > >> experience.
> > > >> >> > > >> >
> > > >> >> > > >> > This sounds great, though it is slightly
different than
> the
> > > >> >> > initially
> > > >> >> > > >> > proposed functionality.  Personally,
I have no
> objection to
> > > >> that,
> > > >> >> as
> > > >> >> > > what
> > > >> >> > > >> > you describe encompasses the original
goals and expands
> on
> > > >> them;
> > > >> >> > but,
> > > >> >> > > it
> > > >> >> > > >> > would be good for the rest of the
community to weigh in.
> > > >> >> > > >> >
> > > >> >> > > >> >
> > > >> >> > > >> > >
> > > >> >> > > >> > > > The fact that there are
not clear answers (and
> > > >> corresponding
> > > >> >> > > >> documented
> > > >> >> > > >> > > > statements on the website)
to these questions
> already
> > > >> means we
> > > >> >> > are
> > > >> >> > > >> not
> > > >> >> > > >> > > > doing a great job of following
the Apache Way.  The
> > > Apache
> > > >> Way
> > > >> >> > is
> > > >> >> > > >> about
> > > >> >> > > >> > > the
> > > >> >> > > >> > > > community and meritocratic,
community-based decision
> > > >> making.
> > > >> >> >  The
> > > >> >> > > ASF
> > > >> >> > > >> > > > defines it as follows:
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > While there is not an official
list, these six
> > > principles
> > > >> have
> > > >> >> > > been
> > > >> >> > > >> > cited
> > > >> >> > > >> > > > as the core beliefs of
philosophy behind the
> foundation,
> > > >> which
> > > >> >> > is
> > > >> >> > > >> > > normally
> > > >> >> > > >> > > > referred to as "The Apache
Way":
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > collaborative software
development
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > commercial-friendly standard
license
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > consistently high quality
software
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > respectful, honest, technical-based
interaction
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > faithful implementation
of standards
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > security as a mandatory
feature
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > All of the ASF projects
share these principles.
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > Let's make sure we propose
changes to the list,
> create
> > > >> tickets
> > > >> >> > > that
> > > >> >> > > >> > > support
> > > >> >> > > >> > > > wider efforts and leverage
principles like lazy
> > > consensus
> > > >> to
> > > >> >> > keep
> > > >> >> > > >> > moving
> > > >> >> > > >> > > > forward in a way that supports
the community.
> > > >> >> > > >> > > +1, +1, +1
> > > >> >> > > >> > >
> > > >> >> > > >> > > On Thu, Apr 10, 2014 at 4:11
PM, Matt Franklin <
> > > >> >> > > >> m.ben.franklin@gmail.com
> > > >> >> > > >> > >
> > > >> >> > > >> > > wrote:
> > > >> >> > > >> > > > tl;dr version:
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > We need to discuss things
on the list more and work
> to
> > > >> define
> > > >> >> > > >> streams,
> > > >> >> > > >> > > > update our public presence
to support this
> definition
> > > and
> > > >> >> > > encourage
> > > >> >> > > >> > > > additional engagement.
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > Long version:
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > For those of you unaware,
Steve Blackmon gave a nice
> > > talk
> > > >> on
> > > >> >> the
> > > >> >> > > work
> > > >> >> > > >> > he
> > > >> >> > > >> > > > has been committing to
Streams at ApacheCon.  As
> part of
> > > >> that
> > > >> >> > talk
> > > >> >> > > >> and
> > > >> >> > > >> > > > follow on discussions,
it became clear that we as a
> > > >> community
> > > >> >> > > need to
> > > >> >> > > >> > do
> > > >> >> > > >> > > > some serious work to define
ourselves, what we are
> > > building
> > > >> >> and
> > > >> >> > > why
> > > >> >> > > >> it
> > > >> >> > > >> > is
> > > >> >> > > >> > > > valuable to the industry.
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > Our website says we are
a Lightweight server for
> > > >> >> > ActivityStreams.
> > > >> >> > > >> >  While
> > > >> >> > > >> > > > this is true to some degree,
I think recent
> > > contributions
> > > >> >> should
> > > >> >> > > >> refine
> > > >> >> > > >> > > > this.  The new code is
really about supporting
> flexible
> > > >> >> > > processing,
> > > >> >> > > >> > > > persistence and retrieval
of data in multiple
> runtimes
> > > >> using
> > > >> >> > > strongly
> > > >> >> > > >> > > > typed, normalized data
formats like ActivityStreams.
> > > >> >> >  Personally,
> > > >> >> > > I
> > > >> >> > > >> > think
> > > >> >> > > >> > > > this slightly new direction
is extremely
> compelling, and
> > > >> the
> > > >> >> > > reaction
> > > >> >> > > >> > to
> > > >> >> > > >> > > > Steve's talk seems to support
that.  The question
> > > remains
> > > >> how
> > > >> >> > does
> > > >> >> > > >> the
> > > >> >> > > >> > > > community as a whole see
the project?  What value is
> > > >> everyone
> > > >> >> > > wanting
> > > >> >> > > >> > to
> > > >> >> > > >> > > > get out of this effort?
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > The fact that there are
not clear answers (and
> > > >> corresponding
> > > >> >> > > >> documented
> > > >> >> > > >> > > > statements on the website)
to these questions
> already
> > > >> means we
> > > >> >> > are
> > > >> >> > > >> not
> > > >> >> > > >> > > > doing a great job of following
the Apache Way.  The
> > > Apache
> > > >> Way
> > > >> >> > is
> > > >> >> > > >> about
> > > >> >> > > >> > > the
> > > >> >> > > >> > > > community and meritocratic,
community-based decision
> > > >> making.
> > > >> >> >  The
> > > >> >> > > ASF
> > > >> >> > > >> > > > defines it as follows:
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > While there is not an official
list, these six
> > > principles
> > > >> have
> > > >> >> > > been
> > > >> >> > > >> > cited
> > > >> >> > > >> > > > as the core beliefs of
philosophy behind the
> foundation,
> > > >> which
> > > >> >> > is
> > > >> >> > > >> > > normally
> > > >> >> > > >> > > > referred to as "The Apache
Way":
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > collaborative software
development
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > commercial-friendly standard
license
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > consistently high quality
software
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > respectful, honest, technical-based
interaction
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > faithful implementation
of standards
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > security as a mandatory
feature
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > All of the ASF projects
share these principles.
> > > >> >> > > >> > > >
> > > >> >> > > >> > > > Let's make sure we propose
changes to the list,
> create
> > > >> tickets
> > > >> >> > > that
> > > >> >> > > >> > > support
> > > >> >> > > >> > > > wider efforts and leverage
principles like lazy
> > > consensus
> > > >> to
> > > >> >> > keep
> > > >> >> > > >> > moving
> > > >> >> > > >> > > > forward in a way that supports
the community.
> > > >> >> > > >> > >
> > > >> >> > > >> > >
> > > >> >> > > >> > >
> > > >> >> > > >> > > --
> > > >> >> > > >> > > Steve Blackmon
> > > >> >> > > >> > > sblackmon@apache.org
> > > >> >> > > >> > >
> > > >> >> > > >> >
> > > >> >> > > >>
> > > >> >> > >
> > > >> >> >
> > > >> >>
> > > >>
> > >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message