cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9975) Flatten Iterator call hierarchy with a shared Transformer
Date Fri, 23 Oct 2015 11:08:28 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970829#comment-14970829
] 

Benedict commented on CASSANDRA-9975:
-------------------------------------

bq. Do you have any results showing what the stack used to look like vs what it looks like
now?

I don't follow, sorry. What do you mean by results? There aren't really any metrics I know
of to quantify this. Do you mean in the earlier version of this ticket, or prior to this ticket?

To describe qualitatively the change: currently, transformations are applied via classes extending
a wrapping iterator (often anonymous). This means that, on iteration, each step of each transformation
typically involves around 3 nested method calls that jump between the same abstract super
classes (so not at all clear which outer class you're actually stepping through), and so that
seeing where you are wrt the start/end of the transformation is extremely challenging. It
involves moving through the call stack (which can get dozens deep) to look behind you, and
walking through an inconsistent object graph to see what's ahead of you. Concatenation makes
this even worse, as it's not immediately obvious which of multiple iterators you will next
be consuming (or even that there are multiple options), and establishing this varies with
each different point of conatenation, as is somewhat true for each regular transformation.
The _source_ iterator is also heavily obscured as a result.

This patch changes the stack of transformations to be applied _iteratively_ (as opposed to
recursively) and be stored as an explicit array of helpfully-named objects, all within a single
iterator. So, the call depth is constant during the application of every update, and you can
see exactly which functions have been applied, and which are yet to be applied, both available
consistently via the same mechanism. The state of all of these functions is available side-by-side,
as is the underlying iterator, and you can easily step through all of them from a single point
of insertion (the {{BaseRow}} or {{BasePartition}} {{hasNext}} methods). 

This patch further flattens the concept of unfiltered/filtered internally, since there is
very little distinction. The external API still disambiguates between the two kinds of iteration,
but internally the transition is just another function to be applied, so again the transition
and control flow are easier to follow and debug.

Concatenation is also now consistently applied. It does have some slightly more complex logic
than I would have liked when transitioning from one iterator to the next, but none of that
needs to be stepped through or understood by anyone outside of modifiers to this logic.

Does that clarify sufficiently?

bq. You mention it improves performance do we have any results to support that?

Performance is an explicit non-goal of this ticket, and I have no expectation of improvement.
The latest patch improves performance _wrt the prior version_ which had a slight performance
regression due to increased GC when the transforming iterators were immutable (since this
meant a lot of array allocations). I have kicked off another comparison for confirmation.

> Flatten Iterator call hierarchy with a shared Transformer
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9975
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9975
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 3.0.0
>
>
> Stepping through a read response is made exceedingly difficult by the sheer depth of
the call hierarchy, and how rapidly your context jumps around. This ticket intend to partially
address that, by flattening one of the main causes of this: iterator transformations.
> I have a patch that attempts to mitigate (but not entirely eliminate) this, through the
introduction of a {{RowTransformer}} class that all transformations are applied through. If
a transformation has already been applied, the {{RowTransformer}} class does not wrap a new
iterator, but instead returns a new {{RowTransformer}} that wraps the original underlying
(untransformed) iterator and both transformations. This can accumulate an arbitrary number
of transformations and, quite importantly, can apply the filtration step {{Unfiltered ->
Row}}  in the same instance as well. The intention being that a majority of control flow happens
inside this {{RowTransformer}}, so there is far less context jumping to cope with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message