cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Lin <wool...@gmail.com>
Subject Re: Cassandra for Analytics?
Date Thu, 18 Dec 2014 15:14:31 GMT
@Colin -
I bounce back and forth on classifying storm and spark as stream processing
frameworks. Clearly they are marketed as stream processing frameworks and
they can process data streams. Even with the commercial stream processing
products, expressing joins with some of the products is a bit "quirky" to
put in a nice way. The streamSql based products tend to be easier for end
users to grok, but it's still not an idea way of expressing temporal
patterns and temporal queries.


that's the reason I always tell our customers figure out your use case
first. though most of them respond with "we don't know the use case, but we
know we want to use it"


On Thu, Dec 18, 2014 at 10:02 AM, Colin <colin@clark.ws> wrote:
>
> Almost every stream processing system I know of offers joins out of the
> box and has done so for years....
>
> Even open source offerings like Esper have offered joins for years.
>
> What hasnt are systems like storm, spark, etc which I dont really classify
> as stream processors anyway.
>
>
>
> --
> *Colin Clark*
> +1-320-221-9531
>
>
> On Dec 18, 2014, at 1:52 PM, Peter Lin <woolfel@gmail.com> wrote:
>
> that depends on what you mean by real-time analytics.
>
> For things like continuous data streams, neither are appropriate platforms
> for doing analytics. They're good for storing the results (aka output) of
> the streaming analytics. I would suggest before you decide cassandra vs
> hbase, first figure out exactly what kind of analytics you need to do.
> Start with prototyping and look at what kind of queries and patterns you
> need to support.
>
> neither hbase or cassandra are good for complex patterns that do joins or
> cross joins (aka mdx), so using either one you have to re-invent stuff.
>
> most of the event processing and stream processing products out there also
> don't support joins or cross joins very well, so any solution is going to
> need several different components. typically stream processing does
> filtering, which feeds another system that does simple joins. The output of
> the second step can then go to another system that does mdx style queries.
>
> spark streaming has basic support, but it's not as mature and feature rich
> as other stream processing products.
>
> On Wed, Dec 17, 2014 at 11:20 PM, Ajay <ajay.garga@gmail.com> wrote:
>>
>> Hi,
>>
>> Can Cassandra be used or best fit for Real Time Analytics? I went through
>> couple of benchmark between Cassandra Vs HBase (most of it was done 3 years
>> ago) and it mentioned that Cassandra is designed for intensive writes and
>> Cassandra has higher latency for reads than HBase. In our case, we will
>> have writes and reads (but reads will be more say 40% writes and 60%
>> reads). We are planning to use Spark as the in memory computation engine.
>>
>> Thanks
>> Ajay
>>
>

Mime
View raw message