spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Davidson (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-13065) streaming-twitter pass twitter4j.FilterQuery argument to TwitterUtils.createStream()
Date Thu, 28 Jan 2016 21:02:39 GMT
Andrew Davidson created SPARK-13065:
---------------------------------------

             Summary: streaming-twitter pass twitter4j.FilterQuery argument to TwitterUtils.createStream()
                 Key: SPARK-13065
                 URL: https://issues.apache.org/jira/browse/SPARK-13065
             Project: Spark
          Issue Type: Improvement
          Components: Streaming
    Affects Versions: 1.6.0
         Environment: all
            Reporter: Andrew Davidson
            Priority: Minor


The twitter stream api is very powerful provides a lot of support for twitter.com side filtering
of status objects. When ever possible we want to let twitter do as much work as possible for
us.

currently the spark twitter api only allows you to configure a small sub set of possible filters


String{} filters = {"tag1", tag2"}
JavaDStream<Status> tweets =TwitterUtils.createStream(ssc, twitterAuth, filters);

The current implemenation does 

private[streaming]
class TwitterReceiver(
    twitterAuth: Authorization,
    filters: Seq[String],
    storageLevel: StorageLevel
  ) extends Receiver[Status](storageLevel) with Logging {

. . .


      val query = new FilterQuery
      if (filters.size > 0) {
        query.track(filters.mkString(","))
        newTwitterStream.filter(query)
      } else {
        newTwitterStream.sample()
      }

...

rather than construct the FilterQuery object in TwitterReceiver.onStart(). we should be able
to pass a FilterQueryObject

looks like an easy fix. See source code links bellow

kind regards

Andy

https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L60

https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message