spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <tathagata.das1...@gmail.com>
Subject Re: Possible approaches for adding extra metadata (Spark Streaming)?
Date Fri, 20 Jun 2014 20:20:11 GMT
If the metadata is directly related to each individual records, then it can
be done either ways. Since I am not sure how easy or hard will it be for
you add tags before putting the data into spark streaming, its hard to
recommend one method over the other.

However, if the metadata is related to each key (based on which you are
called updateStateByKey) and not every record, then it may be more
efficient to maintain that per-key metadata in the updateStateByKey's state
object.

Regarding doing http calls, I would be a bit cautious about performance.
Doing a http call for every records it going to be quite expensive, and
reduce throughput significantly. If it is possible, cache values as much as
possible to amortize the cost of http calls.

TD





On Fri, Jun 20, 2014 at 11:16 AM, Shrikar archak <shrikar84@gmail.com>
wrote:

> Hi All,
>
> I was curious to know which of the two approach is better for doing
> analytics using spark streaming. Lets say we want to add some metadata to
> the stream which is being processed like sentiment, tags etc and then
> perform some analytics using these added metadata.
>
> 1)  Is it ok to make a http call and add some extra information to the
> stream being processed in the updateByKeyAndWindow operations.
>
> 2) Add these sentiment/tags before and then stream through DStreams.
>
> Thanks,
> Shrikar
>
>

Mime
View raw message