streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Letourneau <jletournea...@gmail.com>
Subject Re: Streams Subscriptions
Date Fri, 01 Feb 2013 19:31:42 GMT
slight iteration for clarity:

{
    "auth_token": "token",
    "filters": [
        {
            "field": "fieldname",
            "comparison_operator": "operator",
            "value_set": [
                "val1",
                "val2"
            ]
        }
    ],
    "outputs": [
        {
            "output_type": "http",
            "method": "post",
            "url": "http.example.com:8888",
            "delivery_frequency": "60",
            "max_size": "10485760",
            "auth_type": "none",
            "username": "username",
            "password": "password"
        }
    ]
}

On Fri, Feb 1, 2013 at 12:51 PM, Jason Letourneau
<jletourneau80@gmail.com> wrote:
> So a subscription URL (result of setting up a subscription) is for all
> intents and purposes representative of a set of filters.  That
> subscription can be told to do a variety of things for delivery to the
> subscriber, but the identity of the subscription is rooted in its
> filters.  Posting additional filters to the subscription URL or
> additional output configurations affect the behavior of that
> subscription by either adding more filters or more outputs (removal as
> well).
>
> On Fri, Feb 1, 2013 at 12:17 PM, Craig McClanahan <craigmcc@gmail.com> wrote:
>> A couple of thoughts.
>>
>> * On "outputs" you list "username" and "password" as possible fields.
>>   I presume that specifying these would imply using HTTP Basic auth?
>>   We might want to consider different options as well.
>>
>> * From my (possibly myopic :-) viewpoint, the filtering and delivery
>>   decisions are different object types.  I'd like to be able to register
>>   my set of filters and get a unique identifier for them, and then
>>   separately be able to say "send the results of subscription 123
>>   to this webhook URL every 60 minutes".
>>
>> * Regarding query syntax, pretty much any sort of simple patterns
>>   are probably not going to be sufficient for some use cases.  Maybe
>>   we should offer that as simple defaults, but also support falling back
>>   to some sort of SQL-like syntax (i.e. what JIRA does on the
>>   advanced search).
>>
>> Craig
>>
>>
>> On Fri, Feb 1, 2013 at 8:55 AM, Jason Letourneau <jletourneau80@gmail.com>
>> wrote:
>>>
>>> Based on Steve and Craig's feedback, I've come up with something that
>>> I think can work.  Below it specifies that:
>>> 1) you can set up more than one subscription at a time
>>> 2) each subscription can have many outputs
>>> 3) each subscription can have many filters
>>>
>>> The details of the config would do things like determine the behavior
>>> of the stream delivery (is it posted back or is the subscriber polling
>>> for instance).  Also, all subscriptions created in this way would be
>>> accessed through a single URL.
>>>
>>> {
>>>     "auth_token": "token",
>>>     "subscriptions": [
>>>         {
>>>             "outputs": [
>>>                 {
>>>                     "output_type": "http",
>>>                     "method": "post",
>>>                     "url": "http.example.com:8888",
>>>                     "delivery_frequency": "60",
>>>                     "max_size": "10485760",
>>>                     "auth_type": "none",
>>>                     "username": "username",
>>>                     "password": "password"
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "filters": [
>>>                 {
>>>                     "field": "fieldname",
>>>                     "comparison_operator": "operator",
>>>                     "value_set": [
>>>                         "val1",
>>>                         "val2"
>>>                     ]
>>>                 }
>>>             ]
>>>         }
>>>     ]
>>> }
>>>
>>> Thoughts?
>>>
>>> Jason
>>>
>>> On Thu, Jan 31, 2013 at 7:53 PM, Craig McClanahan <craigmcc@gmail.com>
>>> wrote:
>>> > Welcome Steve!
>>> >
>>> > DataSift's UI to set these things up is indeed pretty cool.  I think
>>> > what
>>> > we're talking about here is more what the internal REST APIs between the
>>> > UI
>>> > and the back end might look like.
>>> >
>>> > I also think we should deliberately separate the filter definition of a
>>> > "subscription" from the instructions on how the data gets delivered.  I
>>> > could see use cases for any or all of:
>>> > * Polling with a filter on oldest date of interest
>>> > * Webhook that gets updated at some specified interval
>>> > * URL to which the Streams server would periodically POST
>>> >   new activities (in case I don't have webhooks set up)
>>> >
>>> > Separately, looking at DataSift is a reminder we will want to be able to
>>> > filter on words inside an activity stream value like "subject" or
>>> > "content", not just on the entire value.
>>> >
>>> > Craig
>>> >
>>> > On Thu, Jan 31, 2013 at 4:29 PM, Jason Letourneau
>>> > <jletourneau80@gmail.com>wrote:
>>> >
>>> >> Hi Steve - thanks for the input and congrats on your first post - I
>>> >> think what you are describing is where Craig and I are circling around
>>> >> (or something similar anyways) - the details on that POST request are
>>> >> really helpful in particular.  I'll try and put something together
>>> >> tomorrow that would be a start for the "setup" request (and subsequent
>>> >> additional configuration after the subscription is initialized) and
>>> >> post back to the group.
>>> >>
>>> >> Jason
>>> >>
>>> >> On Thu, Jan 31, 2013 at 7:00 PM, Steve Blackmon [W2O Digital]
>>> >> <sblackmon@w2odigital.com> wrote:
>>> >> > First post from me (btw I am Steve, stoked about this project and
>>> >> > meeting
>>> >> > everyone eventually.)
>>> >> >
>>> >> > Sorry if I missed the point of the thread, but I think this is
>>> >> > related
>>> >> and
>>> >> > might be educational for some in the group.
>>> >> >
>>> >> > I like the way DataSift's API lets you establish streams - you
POST a
>>> >> > definition, it returns a hash, and thereafter their service follows
>>> >> > the
>>> >> > instructions you gave it as new messages meet the filter you defined.
>>> >> > In
>>> >> > addition, once a stream exists, then you can set up listeners on
that
>>> >> > specific hash via web sockets with the hash.
>>> >> >
>>> >> > For example, here is how you instruct DataSift to push new messages
>>> >> > meeting your criteria to a WebHooks end-point.
>>> >> >
>>> >> > curl -X POST 'https://api.datasift.com/push/create' \
>>> >> > -d 'name=connectorhttp' \
>>> >> > -d 'hash=dce320ce31a8919784e6e85aecbd040e' \
>>> >> > -d 'output_type=http' \
>>> >> > -d 'output_params.method=post' \
>>> >> > -d 'output_params.url=http.example.com:8888' \
>>> >> > -d 'output_params.use_gzip' \
>>> >> > -d 'output_params.delivery_frequency=60' \
>>> >> > -d 'output_params.max_size=10485760' \
>>> >> > -d 'output_params.verify_ssl=false' \
>>> >> > -d 'output_params.auth.type=none' \
>>> >> > -d 'output_params.auth.username=YourHTTPServerUsername' \
>>> >> > -d 'output_params.auth.password=YourHTTPServerPassword' \
>>> >> > -H 'Auth: datasift-user:your-datasift-api-key
>>> >> >
>>> >> >
>>> >> > Now new messages get pushed to me every 60 seconds, and I can get
the
>>> >> feed
>>> >> > in real-time like this:
>>> >> >
>>> >> > var websocketsUser = 'datasift-user';
>>> >> > var websocketsHost = 'websocket.datasift.com';
>>> >> > var streamHash = 'dce320ce31a8919784e6e85aecbd040e';
>>> >> > var apiKey = 'your-datasift-api-key';
>>> >> >
>>> >> >
>>> >> > var ws = new
>>> >> >
>>> >>
>>> >> WebSocket('ws://'+websocketsHost+'/'+streamHash+'?username='+websocketsUser
>>> >> > +'&api_key='+apiKey);
>>> >> >
>>> >> > ws.onopen = function(evt) {
>>> >> >     // connection event
>>> >> >         $("#stream").append('open: '+evt.data+'<br/>');
>>> >> > }
>>> >> >
>>> >> > ws.onmessage = function(evt) {
>>> >> >     // parse received message
>>> >> >         $("#stream").append('message: '+evt.data+'<br/>');
>>> >> > }
>>> >> >
>>> >> > ws.onclose = function(evt) {
>>> >> >     // parse event
>>> >> >         $("#stream").append('close: '+evt.data+'<br/>');
>>> >> > }
>>> >> >
>>> >> > // No event object is passed to the event callback, so no useful
>>> >> debugging
>>> >> > can be done
>>> >> > ws.onerror = function() {
>>> >> >     // Some error occurred
>>> >> >         $("#stream").append('error: '+evt.data+'<br/>');
>>> >> > }
>>> >> >
>>> >> >
>>> >> > At W2OGroup we have built utility libraries for receiving and
>>> >> > processing
>>> >> > Json object streams from data sift in Storm/Kafka that I'm interested
>>> >> > in
>>> >> > extending to work with Streams, and can probably commit to the
>>> >> > project if
>>> >> > the community would find them useful.
>>> >> >
>>> >> >
>>> >> > Steve Blackmon
>>> >> > Director, Data Sciences
>>> >> >
>>> >> > 101 W. 6th Street
>>> >> > Austin, Texas 78701
>>> >> > cell 512.965.0451 | work 512.402.6366
>>> >> > twitter @steveblackmon
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On 1/31/13 5:45 PM, "Craig McClanahan" <craigmcc@gmail.com>
wrote:
>>> >> >
>>> >> >>We'll probably want some way to do the equivalent of ">",
">=", "<",
>>> >> "<=",
>>> >> >>and "!=" in addition to the implicit "equal" that I assume you
mean
>>> >> >> in
>>> >> >>this
>>> >> >>example.
>>> >> >>
>>> >> >>Craig
>>> >> >>
>>> >> >>On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau
>>> >> >><jletourneau80@gmail.com>wrote:
>>> >> >>
>>> >> >>> I really like this - this is somewhat what I was getting
at with
>>> >> >>> the
>>> >> >>> JSON object i.e. POST:
>>> >> >>> {
>>> >> >>> "subscriptions":
>>> >> >>> [{"activityField":"value"},
>>> >> >>> {"activityField":"value",
>>> >> >>>  "anotherActivityField":"value" }
>>> >> >>> ]
>>> >> >>> }
>>> >> >>>
>>> >> >>> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan
>>> >> >>> <craigmcc@gmail.com>
>>> >> >>> wrote:
>>> >> >>> > On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
>>> >> >>> > <jletourneau80@gmail.com>wrote:
>>> >> >>> >
>>> >> >>> >> I am curious on the group's thinking about subscriptions
to
>>> >> >>> >> activity
>>> >> >>> >> streams.  As I am stubbing out the end-to-end
heartbeat on my
>>> >> >>>proposed
>>> >> >>> >> architecture, I've just been working with URL
sources as the
>>> >> >>> >> subscription mode.  Obviously this is a way over-simplification.
>>> >> >>> >>
>>> >> >>> >> I know for shindig the social graph can be used,
but we don't
>>> >> >>> >> necessarily have that.  Considering the mechanism
for
>>> >> >>> >> establishing a
>>> >> >>> >> new subscription stream (defined as aggregated
individual
>>> >> >>> >> activities
>>> >> >>> >> pulled from a varying array of sources) is POSTing
to the
>>> >> >>> >> Activity
>>> >> >>> >> Streams server to establish the channel (currently
just a
>>> >> >>> >> subscriptions=url1,url2,url3 is the over simplified
>>> >> mechanism)...what
>>> >> >>> >> would people see as a reasonable way to establish
subscriptions?
>>> >> >>>List
>>> >> >>> >> of userIds? Subjects?  How should these be represented?
 I was
>>> >> >>> >> thinking of a JSON object, but any one have other
thoughts?
>>> >> >>> >>
>>> >> >>> >> Jason
>>> >> >>> >>
>>> >> >>> >
>>> >> >>> > One idea would be take some inspiration from how JIRA
lets you
>>> >> >>> > (in
>>> >> >>> effect)
>>> >> >>> > create a WHERE clause that looks at any fields (in
all the
>>> >> >>> > activities
>>> >> >>> > flowing through the server) that you want.
>>> >> >>> >
>>> >> >>> > Example filter criteria
>>> >> >>> > * provider.id = 'xxx' // Filter on a particular provider
>>> >> >>> > * verb = 'yyy'
>>> >> >>> > * object.type = 'blogpost'
>>> >> >>> > and you'd want to accept more than one value (effectively
>>> >> >>> > creating OR
>>> >> >>>or
>>> >> >>> IN
>>> >> >>> > type clauses).
>>> >> >>> >
>>> >> >>> > For completeness, I'd want to be able to specify more
than one
>>> >> >>> > filter
>>> >> >>> > expression in the same subscription.
>>> >> >>> >
>>> >> >>> > Craig
>>> >> >>>
>>> >> >
>>> >>
>>
>>

Mime
View raw message