cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns
Date Wed, 04 Feb 2015 16:18:35 GMT


Benedict commented on CASSANDRA-8731:

Well, the simplest implementation is very trivial: Instead of one PQ, you have a PQ for each
level of clustering column. Really not very advanced at all.

If you want to split intra-clustering-component to make byte-order comparable fields more
efficient, that is more involved.

That said, there is no harm in having both. It seems to me that it may be more involved to
do what you suggest, though, especially for sstables that _do_ overlap, but don't overlap
for their entirety. At the moment we just lump all our iterators together; slicing them and
merging them independently actually sounds to me to be much more fiddly and tough to get right
than the simplest approach outlined here.

> Optimise merges involving multiple clustering columns
> -----------------------------------------------------
>                 Key: CASSANDRA-8731
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>              Labels: performance
>             Fix For: 3.0
> Since the new storage format is dead in the water for the moment, we should do our best
to optimise current behaviour. When merging data from multiple sstables with multiple clustering
columns, currently we must incur the full costs of comparison for the entire matching prefix,
and must heapify every cell in our PriorityQueue, incurring lg(N) of these costlier comparisons
for every cell we merge, where N is the number of sources we're merging.
> Essentially I'm proposing a trie-based merge approach as a replacement for the ManyToOne
MergeIterator, wherein we treat each clustering component as a tree underwhich all Cells with
a common prefix occur. We then perform a tree merge, rather than a flat merge. For byte-order
fields this trie can even be a full binary-trie (although built on the fly). The advantage
here is that we rapidly prune merges involving disjoint ranges, so that instead of always
incurring lg(N) costs on each new record, we may often incur O(1) costs. For timeseries data,
for instance, we could merge dozens of files and so long as they were non-overlapping our
CPU burden would be little more than reading from a single file.
> On top of this, we no longer incur any of the shared prefix repetition costs, since we
compare each prefix piece-wise, and only once.

This message was sent by Atlassian JIRA

View raw message