spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Polisetty <>
Subject Pros and Cons of Spark Batch over Streaming for processing data queried from Elastic Search
Date Fri, 10 Apr 2015 16:29:33 GMT
I’ve a bunch of data (tens of millions of records each about 1K each) that is indexed in
Elastic Search. I need to get records from elastic search that match a given query criteria
and then post process it to generate something like a word frequency list. All this needs
to be done programmatically based on client request from a RESTFul service as I need the results
back in the client request handler for further processing.

I was wondering whether implementing this solution in Spark streaming is better or to use
plain Spark in batch mode.  I need to know the Pros and Cons of each approach. I’ve already
implemented a solution using Spark Streaming but I’m willing to revise it based on batch
mode if it has any advantages in terms of performance.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message