cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From benjamin roth <brs...@gmail.com>
Subject Re: State of triggers
Date Sun, 05 Mar 2017 08:05:04 GMT
While I was reading the MV paragraph in your post, an idea popped up:

The problem with MV inconsistencies and inconsistent range movement is that
the "MV contract" is broken. This only happens because base data and
replica data reside on different hosts. If base data + replicas would stay
on the same host then a rebuild/remove would always stream both matching
parts of a base table + mv.

So my idea:
Why not make a replica ALWAYS stay local regardless where the token of a MV
would point at. That would solve these problems:
1. Rebuild / remove node would not break MV contract
2. A write always stays local:

a) That means replication happens sync. That means a quorum write to the
base table guarantees instant data availability with quorum read on a view

b) It saves network roundtrips + request/response handling and helps to
keep a cluster healthier in case of bulk operations (like repair streams or
rebuild stream). Write load stays local and is not spread across the whole
cluster. I think it makes the load in these situations more predictable.

How can that be achieved? I haven't done "scientific researches" yet but I
guess a "MV partitioner" could do the trick. Instead of applying the
regular partitioner, an MV partitioner would calculate the PK of the base
table (which is always possible) and then apply the regular partitioner.

I'll create a proper Jira for it on monday. Currently it's sunday here and
my family wants me back so just a few thoughts on this right now.

Any feedback is appreciated!

2017-03-05 6:34 GMT+01:00 Edward Capriolo <edlinuxguru@gmail.com>:

> On Sat, Mar 4, 2017 at 10:26 AM, Jeff Jirsa <jjirsa@gmail.com> wrote:
>
> >
> >
> >
> > > On Mar 4, 2017, at 7:06 AM, Edward Capriolo <edlinuxguru@gmail.com>
> > wrote:
> > >
> > >> On Fri, Mar 3, 2017 at 12:04 PM, Jeff Jirsa <jjirsa@gmail.com> wrote:
> > >>
> > >> On Fri, Mar 3, 2017 at 5:40 AM, Edward Capriolo <
> edlinuxguru@gmail.com>
> > >> wrote:
> > >>
> > >>>
> > >>> I used them. I built do it yourself secondary indexes with them. They
> > >> have
> > >>> there gotchas, but so do all the secondary index implementations.
> Just
> > >>> because datastax does not write about something. Lets see like 5
> years
> > >> ago
> > >>> there was this: https://github.com/hmsonline/cassandra-triggers
> > >>>
> > >>>
> > >> Still in use? How'd it work? Production ready? Would you still do it
> > that
> > >> way in 2017?
> > >>
> > >>
> > >>> There is a fairly large divergence to what actual users do and what
> > other
> > >>> groups 'say' actual users do in some cases.
> > >>>
> > >>
> > >> A lot of people don't share what they're doing (for business reasons,
> or
> > >> because they don't think it's important, or because they don't know
> > >> how/where), and that's fine but it makes it hard for anyone to know
> what
> > >> features are used, or how well they're really working in production.
> > >>
> > >> I've seen a handful of "how do we use triggers" questions in IRC, and
> > they
> > >> weren't unreasonable questions, but seemed like a lot of pain, and
> more
> > >> than one of those people ultimately came back and said they used some
> > other
> > >> mechanism (and of course, some of them silently disappear, so we have
> no
> > >> idea if it worked or not).
> > >>
> > >> If anyone's actively using triggers, please don't keep it a secret.
> > Knowing
> > >> that they're being used would be a great way to justify continuing to
> > >> maintain them.
> > >>
> > >> - Jeff
> > >>
> > >
> > > "Still in use? How'd it work? Production ready? Would you still do it
> > that way in 2017?"
> > >
> > > I mean that is a loaded question. How long has cassandra had Secondary
> > > Indexes? Did they work well? Would you use them? How many times were
> > they re-written?
> >
> > It wasn't really meant to be a loaded question; I was being sincere
> >
> > But I'll answer: secondary indexes suck for many use cases, but they're
> > invaluable for their actual intended purpose, and I have no idea how many
> > times they've been rewritten but they're production ready for their
> narrow
> > use case (defined by cardinality).
> >
> > Is there a real triggers use case still? Alternative to MVs? Alternative
> > to CDC? I've never implemented triggers - since you have, what's the
> level
> > of surprise for the developer?
>
>
> :) You mention alternatives/: Lets break them down.
>
> MV:
> They seem to have a lot pf promise. IE you can use them for things other
> then equality searches, and I do think the CQL example with the top N high
> scores is pretty useful. Then again our buddy Mr Roth has a thread named
> "Rebuild / remove node with MV is inconsistent". I actually think a lot of
> the use case for mv falls into the category of "something you should
> actually be doing with storm". I can vibe with the concept of not needing a
> streaming platform, but i KNOW storm would do this correctly. I don't want
> to land on something like 2x index v1 v2 where there was fundamental flaws
> at scale.(not saying this is case but the rebuild thing seems a bit scary)
>
> CDC:
> I slightly afraid of this. Rational: A extensible piece design specifically
> for a close source implementation of hub and spoke replication. I have some
> experience trying to "play along" with extensible things
> https://issues.apache.org/jira/browse/CASSANDRA-12627
> "Thus, I'm -1 on {[PropertyOrEnvironmentSeedProvider}}."
>
> Not a rub, but I can't even get something committed using an existing
> extensible interface. Heaven forbid a use case I have would want to
> *change*
> the interface, I would probably get a -12. So I have no desire to try and
> maintain a CDC implementation. I see myself falling into the same old "why
> you want to do this? -1" trap.
>
> Coordinator Triggers:
> To bring things back really old-school coordinator triggers everyone always
> wanted. In a nutshell, I DO believe they are easier to reason about then
> MV. It is pretty basic, it happens on the coordinator there is no batchlogs
> or whatever, best effort possibly requiring more nodes then as the keys
> might be on different services. Actually I tend do like features like. Once
> something comes on the downswing of  "software hype cycle" you know it is
> pretty stable as everyone's all excited about other things.
>
> As I said, I know I can use storm for top-n, so what is this feature? Well
> I want to optimize my network transfer generally by building my batch
> mutations on the server. Seems reasonable. Maybe I want to have my own
> little "read before write" thing like CQL lists.
>
> The warts, having tried it. First time i tried it found it did not work
> with non batches, patched in 3 hours. Took weeks before some CQL user had
> the same problem and it got fixed :) There was no dynamic stuff at the time
> so it was BYO class loader. Going against the grain and saying.
>
> The thing you have to realize with the best effort coordinator triggers are
> that "transaction" could be incomplete and well that sucks maybe for some
> cases. But I actually felt the 2x index implementations force all problems
> into a type of "foreign key transnational integrity " that does not make
> sense for cassandra.
>
> Have you every used elastic search, there version of consistency is write
> something, keep reading and eventually you see it, wildly popular :) It is
> a crazy world.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message