spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <>
Subject Re: What is the difference between mini-batch vs real time streaming in practice (not theory)?
Date Tue, 27 Sep 2016 07:54:57 GMT
Replace mini-batch with micro-batching and do a search again. what is your
understanding of fraud detection?

Spark streaming can be used for risk calculation and fraud detection
(including stopping fraud going through for example credit card
fraud) effectively "in practice". it can even be used for Complex Event


Dr Mich Talebzadeh

LinkedIn *

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On 27 September 2016 at 08:12, kant kodali <> wrote:

> What is the difference between mini-batch vs real time streaming in
> practice (not theory)? In theory, I understand mini batch is something that
> batches in the given time frame whereas real time streaming is more like do
> something as the data arrives but my biggest question is why not have mini
> batch with epsilon time frame (say one millisecond) or I would like to
> understand reason why one would be an effective solution than other?
> I recently came across one example where mini-batch (Apache Spark) is used
> for Fraud detection and real time streaming (Apache Flink) used for Fraud
> Prevention. Someone also commented saying mini-batches would not be an
> effective solution for fraud prevention (since the goal is to prevent the
> transaction from occurring as it happened) Now I wonder why this wouldn't
> be so effective with mini batch (Spark) ? Why is it not effective to run
> mini-batch with 1 millisecond latency? Batching is a technique used
> everywhere including the OS and the Kernel TCP/IP stack where the data to
> the disk or network are indeed buffered so what is the convincing factor
> here to say one is more effective than other?
> Thanks,
> kant

View raw message