cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <>
Subject Re: Advice for asymmetric reporting cluster architecture
Date Sat, 17 Oct 2015 15:50:09 GMT
Did you consider DSE Search in a DC?

-- Jack Krupansky

On Sat, Oct 17, 2015 at 11:30 AM, Mark Lewis <> wrote:

> I've got an existing C* cluster spread across three data centers, and I'm
> wrestling with how to add some support for ad-hoc user reporting against
> (ideally) near real-time data.
> The type of reports I want to support basically boil down to allowing the
> user to select a single highly-denormalized "Table" from a predefined list,
> pick some filters (ideally with arbitrary boolean logic), project out some
> columns, and allow for some simple grouping and aggregation.  I've seen
> several companies expose reporting this way and it seems like a good way to
> avoid the complexity of joins while still providing a good deal of
> flexibility.
> Has anybody done this or have any recommendations?
> My current thinking is that I'd like to have the ad-hoc reporting
> infrastructure in separate data centers from our active production
> OLTP-type stuff, both to isolate any load away from the OLTP infrastructure
> and also because I'll likely need other stuff there (Spark?) to support
> ad-hoc reporting.
> So I basically have two problems:
> (1) Get an eventually-consistent view of the data into a data-center I can
> query against relativly quickly (so no big batch imports)
> (2) Be able to run ad-hoc user queries against it
> If I just think about query flexibility, I might consider dumping data
> into PostgreSQL nodes (practical because the data that any individual user
> can query will fit onto a single node).  But then I have the problem of
> getting the data there; I looked into an architecture using Kafka to pump
> data from the OLTP data centers to PostgreSQL mirrors, but down that road
> lies the need to manually deal with the eventual consistency.  Ugh.
> If I just run C* nodes in my reporting cluster that makes the problem of
> getting the data into the right place with eventual consistency easy to
> solve and I like that idea quite a lot, but then I need to run reporting
> against C*.  I could make the queries I need to run reasonably performant
> with enough secondary-indexes or materialized views (we're upgrading to 3.0
> soon), but I would need a lot of secondary-indexes and materialized views,
> and I'd rather not pay to store them in all of my data centers.  I wish
> there were a way to define secondary-indexes or materialized views to only
> exist in one DC of a cluster, but unless I've missed something it doesn't
> look possible.
> Any advice or case studies in this area would be greatly appreciated.
> -- Mark

View raw message