You know what I’m really trying to do? I’m trying to come up with a best
practice technology stack. There are so many freaking projects it is
overwhelming. If I were to walk into an organization that had no Big Data
capability, what mix of projects would be best to implement based on
performance, scalability and easy of use/implementation? So far I’ve
Cassandra (Seems to be the highest performing NoSQL database out
Python (Easier than Java. Maybe that shouldn’t be a concern.)
Hive (For people to leverage their existing SQL skillset.)
That would seem to cover transaction processing and warehouse storage and
the capability to do batch and real time analysis. What am I leaving out or what
do I have incorrect in my assumptions?
Sent: Wednesday, July 02, 2014 3:07 PM
Subject: Re: Spark vs. Storm
Spark Streaming discretizes the stream by configurable intervals of
no less than 500Milliseconds. Therefore it is not appropriate for true real time
processing.So if you need to capture events in the low 100's of milliseonds
range or less than stick with Storm (at least for now).
If you can afford one second+ of latency then spark provides advantages of
interoperability with the other Spark components and capabilities.