streams-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Blackmon [W2O Digital]" <sblack...@w2odigital.com>
Subject Re: More Cassandra Branch Updates
Date Wed, 18 Sep 2013 19:54:33 GMT
Danny,

I looked through streams-cassandra this morning and noticed that you used
CQL to insert/select from subscription and activity column families that
are shared by all publishers and subscribers.  There¹s nothing wrong with
this approach at moderate scale (up to millions of messages), but I am
certain that using hector/intravert would perform better at large data
volumes for certain common use cases.

Also, note that we don¹t necessarily need the activity schema to be fixed
by the Repository. C* can store serialized nested objects within byte
buffers (compressed), nested objects as row and column keys via composites
- and provides hooks for evaluating certain values within the keys at
slice/select time, without expecting that every field must always be
present (think extensions).

As a for instance, a hector/iv cassandra persister could create a row for
every subscription and write the entire activity object into a dedicated
row for each subscriber, using timestamp (or a more complex time+metadata
composite) as the columnid.  This pattern works well when
getTimeline(topicid/userid, start:optional, end:optional, filter:optional)
or some variant is called frequently, even as user and message counts
grow, because the entire potential payload for the request is on a single
node pre-sorted on disk.

Serialization techniques such as this are what I was referring to when
suggesting that CQL may not be ideal.  That said, I¹ve used a CQL approach
much like this when getting started or not expecting multiple millions of
rows and it does work.  Kudos for making time to put together a working
scalable persister.  We are about to need a intermediate message
persistence layer smarter than kafka - we will deploy streams as we do and
commit back whatever code we write when it¹s ready.

Fortunately the way this project is structured (modular, loosely-coupled,
etcŠ) anyone can implement persistence strategies and each system
administrator can select which backends/libraries to use for each
participant in their flow based on expected/empirical usage patterns!  In
theoryŠ :)

Steve Blackmon



On 9/9/13, 3:27 PM, "Danny Sullivan" <dsullivan7@hotmail.com> wrote:

>The cassandra branch under
>https://svn.apache.org/repos/asf/incubator/streams/branches/cassandra/ is
>at the point where I'd like to start the conversation about merging it
>with trunk. It works by taking in all activity off the activemq and
>storing it in the cassandra database. The new activity aggregator has a
>method distributeToSubscribers which runs on a timer every 30 seconds and
>pushes relevant activity to all subscribers based on the filters they've
>specified and last time that their streams were updated.
>For my branch I used the DataStax java driver. I'd like to keep a couple
>chips in the CQL driver pot, at least until Intravert has a stable
>release and the Cassandra wiki  retracts the recommendation of exclusive
>use of CQL clients.
>I've been using cassandra 1.2 for running it.
>I'd like to hear what everyone thinks and if there's support, perhaps we
>can merge the branch.
>Danny




CONFIDENTIALITY NOTICE: This e-mail, along with any documents, files, or attachments, may
contain information that is confidential, privileged, or otherwise exempt from disclosure.
If you are not the intended recipient or person responsible for delivering it to the intended
recipient, you are hereby notified that any disclosure, copying, printing, distribution or
use of any information contained in or attached to this e-mail is strictly prohibited. If
you have received this e-mail in error, please immediately notify the sender and delete the
original e-mail and its attachments without reading, printing, or saving in any manner. This
e-mail message should not be interpreted to include a digital or electronic signature that
can be used to authenticate an agreement, contract or other legal document, nor to reflect
an intention to be bound to any legally-binding agreement or contract. Your cooperation is
appreciated. Thank you.

Mime
View raw message