cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <>
Subject [jira] Commented: (CASSANDRA-2062) Use more efficient merge algorithm
Date Thu, 27 Jan 2011 10:25:43 GMT


Stu Hood commented on CASSANDRA-2062:

Also, as an important side note: the core reason for implementing this was to gain control
over the consumption of the lazy nested iterators in the read path.

We survive now because we write the size of the row at the front of the row (via some serious
acrobatics at write time), which gives us hasNext() for rows for free. But it became apparent
while working on the block-based format that hasNext() will not be cheap unless the current
item has been consumed. "Consumption" of the row is easy, and blocks will be framed so that
they can be very easily skipped, but you don't want to have to seek to the end of the row
to answer hasNext, and then seek back to the beginning to consume the row, which is what CollatingIterator
would have forced us to do.

There is actually one more roadblock to making this crazy iterator dance work: ReducingIterator
pulls one more item than it needs from its source before 'reducing' (consuming) the items
it is holding. This breaks the invariant, but it should be fixable by smashing ReducingIterator
into MergeIterator.

> Use more efficient merge algorithm
> ----------------------------------
>                 Key: CASSANDRA-2062
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Priority: Minor
>             Fix For: 0.7.2
>         Attachments: 0001-Improved-iterator-for-merging-sorted-iterators.txt, 0002-Quickie-instrumentation-for-comparisons.txt,
0003-Replace-Collating-with-Merge-in-CompactionIterator.txt, 0004-Port-LazilyCompactedRow-ReducingKeyIterator-RangeSlice.txt,
> For {{M}} iterators containing {{N}} total items, commons.collections.CollatingIterator
performs a {{O(M*N)}} merge, and calls hasNext multiple times per returned value. We can do

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message