cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <j...@apache.org>
Subject [jira] Updated: (CASSANDRA-1106) Use Scanner API for all reads
Date Wed, 19 May 2010 15:39:53 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Stu Hood updated CASSANDRA-1106:
--------------------------------

    Description: 
The goal of this issue is to eliminate the IColumnIterator interface, and to use the Slice/Scanner
API for all reads. Additionally, this issue begins to optimize the interaction between FilteredScanner
and QueryFilter to gain back speed lost in CASSANDRA-1095.

This issue adds Memtable.Scanner and converts Memtables to maps from DecoratedKey -> List<Slice>
(where the list represents a row: one entry for Standard CFs, and more than one entry for
Super CFs). Since Slices are immutable, rows in the Memtable are merged using SliceMergingIterator,
and atomically swapped out. This is much less granular atomicity than we support currently,
so this approach to mapping the Memtable to Slices is wide open to debate.

The row cache in this patch mimics the Memtable and becomes a map from DecoratedKey ->
List<Slice>. In order to reuse the QueryFilter API, a db.ListScanner is added to wrap
an individual row in the cache for filtering. One limitation imposed by this design is that
the row cache can't be used as a write-through cache, since its entries are immutable.

The common order of operations is:
1. Get a SeekableScanner implementation for the Memtable/cache entry/SSTable
2. Build a QueryFilter describing the query
3. Call QueryFilter.filter(scanner) to wrap the SeekableScanner in a FilteredScanner
.* Optionally, merge multiple Scanners using MergingScanner
4. Call QueryFilter.collect(scanner) to wrap garbage collection around the merged input
5. Limit the output columns using QueryFilter.limit(scanner)

Optimization between FilteredScanner and QueryFilter is accomplished via the MatchResult object,
which is pretty ugly, and still a work in progress. Internally to a QueryFilter, IFilters
for each level return MatchResults indicating where their next interesting matches are, and
QueryFilter composes the levels into a MatchResult that a FilteredScanner uses to seek on
its underlying Scanner.

These patches remove a lot of deeply nested and complicated logic for dealing with super columns
and garbage collection, including IFilter.filterSuperColumn (replaced naturally by Slice filtering),
IFilter.collectReducedColumns (ditto) and ColumnFamilyStore.removeDeleted (replaced by ASlice.GCFunction).
Additionally, they replace scads of AbstractIterator implementations that were implementing
IColumnIterator on a case by case basis.

  was:
The goal of this issue is to eliminate the IColumnIterator interface, and to use the Slice/Scanner
API for all reads. Additionally, this issue begins to optimize the interaction between FilteredScanner
and QueryFilter to gain back speed lost in CASSANDRA-1095.

This issue adds Memtable.Scanner and converts Memtables to maps from DecoratedKey -> List<Slice>
(where the list represents a row: one entry for Standard CFs, and more than one entry for
Super CFs). Since Slices are immutable, rows in the Memtable are merged using SliceMergingIterator,
and atomically swapped out. This is much less granular atomicity than we support currently,
so this approach to mapping the Memtable to Slices is wide open to debate.

The row cache in this patch mimics the Memtable and becomes a map from DecoratedKey ->
List<Slice>. In order to reuse the QueryFilter API, a db.ListScanner is added to wrap
an individual row in the cache for filtering. One limitation imposed by this design is that
the row cache can't be used as a write-through cache, since its entries are immutable.

The common order of operations is:
# Get a SeekableScanner implementation for the Memtable/cache entry/SSTable
# Build a QueryFilter describing the query
# Call QueryFilter.filter(scanner) to wrap the SeekableScanner in a FilteredScanner
* Optionally, merge multiple Scanners using MergingScanner
# Call QueryFilter.collect(scanner) to wrap garbage collection around the merged input
# Limit the output columns using QueryFilter.limit(scanner)

Optimization between FilteredScanner and QueryFilter is accomplished via the MatchResult object,
which is pretty ugly, and still a work in progress. Internally to a QueryFilter, IFilters
for each level return MatchResults indicating where their next interesting matches are, and
QueryFilter composes the levels into a MathResult that a FilteredScanner uses to see on its
underlying Scanner.

These patches remove a lot of deeply nested and complicated logic for dealing with super columns
and garbage collection, including IFilter.filterSuperColumn (replaced naturally by Slice filtering),
IFilter.collectReducedColumns (ditto) and ColumnFamilyStore.removeDeleted (replaced by ASlice.GCFunction).
Additionally, they replace scads of AbstractIterator implementations that were implementing
IColumnIterator on a case by case basis.


> Use Scanner API for all reads
> -----------------------------
>
>                 Key: CASSANDRA-1106
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1106
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Stu Hood
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: 0001-Implement-transitional-CF-Slice-API.patch, 0002-Per-parent-Slice-based-atomicity-for-Memtables.patch,
0003-Use-Scanner-API-in-RowIteratorFactory-and-port-getTo.patch, 0004-Remove-IColumnIterator-and-other-stale-I-Filter-code.patch,
0005-Add-limit-parameter-to-QueryFilter-rather-than-level.patch, 0006-Add-MatchResult-to-give-FilteredScanner-hints-to-fin.patch,
0007-Compose-level-MatchResults-in-QueryFilter-and-begin-.patch, 0008-Add-IFilter.initial-to-return-the-first-interesting-.patch
>
>
> The goal of this issue is to eliminate the IColumnIterator interface, and to use the
Slice/Scanner API for all reads. Additionally, this issue begins to optimize the interaction
between FilteredScanner and QueryFilter to gain back speed lost in CASSANDRA-1095.
> This issue adds Memtable.Scanner and converts Memtables to maps from DecoratedKey ->
List<Slice> (where the list represents a row: one entry for Standard CFs, and more than
one entry for Super CFs). Since Slices are immutable, rows in the Memtable are merged using
SliceMergingIterator, and atomically swapped out. This is much less granular atomicity than
we support currently, so this approach to mapping the Memtable to Slices is wide open to debate.
> The row cache in this patch mimics the Memtable and becomes a map from DecoratedKey ->
List<Slice>. In order to reuse the QueryFilter API, a db.ListScanner is added to wrap
an individual row in the cache for filtering. One limitation imposed by this design is that
the row cache can't be used as a write-through cache, since its entries are immutable.
> The common order of operations is:
> 1. Get a SeekableScanner implementation for the Memtable/cache entry/SSTable
> 2. Build a QueryFilter describing the query
> 3. Call QueryFilter.filter(scanner) to wrap the SeekableScanner in a FilteredScanner
> .* Optionally, merge multiple Scanners using MergingScanner
> 4. Call QueryFilter.collect(scanner) to wrap garbage collection around the merged input
> 5. Limit the output columns using QueryFilter.limit(scanner)
> Optimization between FilteredScanner and QueryFilter is accomplished via the MatchResult
object, which is pretty ugly, and still a work in progress. Internally to a QueryFilter, IFilters
for each level return MatchResults indicating where their next interesting matches are, and
QueryFilter composes the levels into a MatchResult that a FilteredScanner uses to seek on
its underlying Scanner.
> These patches remove a lot of deeply nested and complicated logic for dealing with super
columns and garbage collection, including IFilter.filterSuperColumn (replaced naturally by
Slice filtering), IFilter.collectReducedColumns (ditto) and ColumnFamilyStore.removeDeleted
(replaced by ASlice.GCFunction). Additionally, they replace scads of AbstractIterator implementations
that were implementing IColumnIterator on a case by case basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message