streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Blackmon <st...@blackmon.org>
Subject Re: Desired behavior for DataSift user_mention serialization
Date Mon, 30 Jun 2014 21:59:24 GMT
While that approach may result in mention objects with ids/names that were originally paired,
we can't guarantee that without making an API lookup to twitter.

In general I’m in favor of streams maintaining and attempting to improve data accuracy,
in this scenario it seems the inbound document has been degraded in a way that goes beyond
the current scope and authority (API-wise) of the module to resolve, and given that I’m
wary of setting the extension fields in a way that could potentially recombine fields incorrectly
and thus make the problem worse.

So my vote would be create a separate object for id and name in every case, maintaining all
of the original information and leave it to a downstream processor to improve the metadata
if there is value to doing so.

Steve Blackmon
steve@blackmon.org



On Jun 30, 2014, at 9:50 AM, Robert Douglas <robert.baker.douglas@gmail.com> wrote:

> Hi all,
> 
> I’m currently working on cleaning up the implementation of the DataSift
> serializer and have come upon an issue. The data that we get back in a
> DataSift Interaction object contains two fields, mentions (which has all
> the handles for mentioned users) and mention_ids (which has all the Ids for
> mentioned users). Problem is, there is no guarantee that these two lists
> will be the same size. My current solution is to merge together the handles
> and Ids into individual UserMention objects whenever the mentions and
> mention_ids lists are the same size. In the event that those lists are not
> the same size, I create UserMention objects for every entry in both lists.
> 
> Does anyone have an different opinion on how this should be handled?
> 
> — Robert


Mime
View raw message