streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sblackmon <>
Subject Social Media Metrics using Apache stack
Date Mon, 21 Nov 2016 17:48:26 GMT
Hello ComDev,

The Streams podling has been brainstorming ways to increase awareness of the project and it’s
capabilities.  We’ve also been working to make it easier to get started as a user, without
starting the journey by downloading JDK Maven and friends.  Using the software to provide
benefit to the Foundation seems like a good thing to try.

One use case for Streams is to build personal or organizational datasets of social media profiles
and content for internal development and analysis, using the technologies and tools you and
your organization prefer, rather than those provided by the upstream system.

I took the liberty of creating a few Zeppelin notebooks which collect Apache project profiles
and posts, normalize them to activity streams format, and interact with them using spark data

The notebooks are currently hosted in my zeppelinhub account, which anyone with the link below
can access.

If this group sees potential benefit, I’d be happy to work to set them up for use by anyone
at Apache in a dedicated Zeppelin deployment and take the lead on maintaining them going forward.

In any case we’d appreciate any feedback on what could would make this prototype more valuable..

Background on Streams:

Apache Streams (incubating) unifies a diverse world of digital profiles and online activities
into common formats and vocabularies, and makes these datasets accessible across a variety
of databases, devices, and platforms for streaming, browsing, search, sharing, and analytics

Streams contains libraries and patterns for specifying, publishing, and inter-linking schemas,
and assists with conversion of activities (posts, shares, likes, follows, etc.) and objects
(profiles, pages, photos, videos, etc.) between the representation, format, and encoding preferred
by supported data providers (Twitter, Instagram, etc.), and storage services (Cassandra, Elasticsearch,
HBase, HDFS, Neo4J, etc.)

In theory pretty much any JSON or XML API which uses a "look-up by ID and type” model can
be co-erced into collections of activity-streams normalized profiles and posts - systems such
as GitHub, JIRA, MeetUp could be added to the roadmap and have notebooks created once those
providers are built.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message