streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sblackmon <sblack...@apache.org>
Subject Ease-of-use : minimizing TTHW (time-to-hello-world)
Date Thu, 06 Oct 2016 18:55:53 GMT
 
TL;DR I’ve found a way to dramatically reduce barriers to using streams as a beginner.

Using the streams 0.3 release, it’s quite a headache for a novice to use streams. We have
a tutorial on the website, but it’s quite a journey. You have to check out all three repos
and install them each in order before you get a jar file you could use to get data, then you
can run a few pre-canned streams, and those are intermediate not beginner level.  

In an ideal world, anyone would be able to yum or apt-get (or docker pull) individual providers
or processors and run them on their own without building from source or composing them into
multi-step streams.  

We'd have increase our build and compliance complexity significantly to publish official binaries.
So what can we do to drop the learning curve precipitously without doing that?

Providers are really simple to run. The hard part is getting all of the right classes and
configuration properties into a JVM. Inspired by how zeppelin’s %dep interpreter reduces
the friction in composing and running a scala notebook, I wanted to find a way to get the
same ability from a linux shell.

The commands below go from just a java installation to flat files of twitter data in just
a few minutes.

I think until we have binary distributions, this is how our tutorials should tell the world
to get started with streams.  

Thoughts?  

-----  

# install sbtx

curl -s https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt > /usr/bin/sbtx &&
chmod 0755 /usr/bin/sbtx

# create a workspace

mkdir twitter-test; cd twitter-test;

# supply a config file with credentials

cat > application.conf << EOF
twitter {
  oauth {
    consumerKey = ""
    consumerSecret = ""
    accessToken = ""
    accessTokenSecret = ""
  }
  retrySleepMs = 5000
  retryMax = 250
  info = [
    18055613
  ]
}
EOF

sbtx -210 -sbt-create

set resolvers += "Local Maven Repository" at "file://"+Path.userHome.absolutePath+"/.m2/repository"

set libraryDependencies += "org.apache.streams" % "streams-provider-twitter" % "0.4-incubating-SNAPSHOT"

set fork := true

run-main org.apache.streams.twitter.provider.TwitterUserInformationProvider application.conf
users.txt

run-main org.apache.streams.twitter.provider.TwitterTimelineProvider application.conf statuses.txt

set javaOptions += "-Dtwitter.endpoint=friends"

run-main org.apache.streams.twitter.provider.TwitterFollowingProvider application.conf friends.txt

set javaOptions += "-Dtwitter.endpoint=followers"

exit

ls -l

Steves-MacBook-Pro-3:twitter sblackmon$ ls -l
-rw-r--r--@ 1 sblackmon staff 356 Oct 6 11:54 application.conf
-rw-r--r-- 1 sblackmon staff 293780 Oct 6 13:42 followers.txt
-rw-r--r-- 1 sblackmon staff 6260 Oct 6 13:43 friends.txt
drwxr-xr-x 3 sblackmon staff 102 Oct 6 10:17 project
-rw-r--r-- 1 sblackmon staff 3339460 Oct 6 13:43 statuses.txt
drwxr-xr-x 6 sblackmon staff 204 Oct 6 10:19 target
-rw-r--r-- 1 sblackmon staff 3321 Oct 6 13:43 users.txt



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message