cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-2062) Better control of iterator consumption
Date Tue, 14 Jun 2011 23:10:47 GMT


Stu Hood updated CASSANDRA-2062:

    Attachment: 0004-CASSANDRA-2062-0004-Replace-ReducingIterator-for-lazy-.txt

bq. [CollatingIterator] calls hasNext() on its child iterator immediately after pulling off
the least value from one.
ReducingIterator also does not know whether there are more items until it has consumed past
the end of the current item, which is why it was necessary to squash it in.

bq. we still have the RI use in collectCollatedColumns [...] as well as LazyColumnIterator
Fixed in 0003 and removed in 0004.

bq. It looks to me like the main obstacle to using MI there is making MI.Reducer support customizable
The comparator passed to the MI is used for isEqual, so it should be pluggable already.

> Better control of iterator consumption
> --------------------------------------
>                 Key: CASSANDRA-2062
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>            Priority: Minor
>             Fix For: 1.0
>         Attachments: 0001-CASSANDRA-2062-0001-Improved-iterator-for-merging-sort.txt,
0002-CASSANDRA-2062-0002-Port-all-collating-consumers-to-Me.txt, 0003-CASSANDRA-2062-0003-Replace-ReducingIterator-in-QueryF.txt,
> The core reason for this ticket is to gain control over the consumption of the lazy nested
iterators in the read path.
> {quote}We survive now because we write the size of the row at the front of the row (via
some serious acrobatics at write time), which gives us hasNext() for rows for free. But it
became apparent while working on the block-based format that hasNext() will not be cheap unless
the current item has been consumed. "Consumption" of the row is easy, and blocks will be framed
so that they can be very easily skipped, but you don't want to have to seek to the end of
the row to answer hasNext, and then seek back to the beginning to consume the row, which is
what CollatingIterator would have forced us to do.{quote}
> While we're at it, we can also improve efficiency: for {{M}} iterators containing {{N}}
total items, commons.collections.CollatingIterator performs a {{O(M*N)}} merge, and calls
hasNext multiple times per returned value. We can do better.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message