accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Fuchs (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-697) Break Scanner parameterization from Key,Value to Key,{Something}
Date Sat, 21 Jul 2012 20:06:33 GMT


Adam Fuchs commented on ACCUMULO-697:

I like the concept, but does this go far enough? If Values aren't special, then are Keys special,
and if so then why? Should we make our SortedKeyValueIterator implement Iterable<? extends
Object> ? Then the bottom level iterator (RFile reader) would include KeyValue or Entry<Key,Value>
objects, the top level iterator for scans would have to have objects that are serializable,
and the top level iterator for compactions would have to implement Iterable<Entry<Key,Value>>.

One of the problems we have with iterators now is that the Key and Value are accessed with
separate methods, even though they're always read off of disk together. Splitting up the Key
and Value on the server side is sort of arbitrary and could reduce our ability to parallelize
iterators (if we ever decide that's something we want to do).

Another problem is that SortedKeyValueIterator falls somewhere in between Java's Iterator
and Iterable interfaces. SortedKeyValueIterator holds onto filters, aggregation parameters,
etc. that make it act like a collection, and it keeps a pointer to somewhere in that collection
like an Iterator. I think we should change SortedKeyValueIterator into more like an immutable
collection, or a consistent, isolated, unchanging view of the data, and have it implement
Iterable. That might open up opportunities for automating optimization of queries on the server
side, or better support for built-in iterator tree definition languages.
> Break Scanner parameterization from Key,Value to Key,{Something}
> ----------------------------------------------------------------
>                 Key: ACCUMULO-697
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.5.0
>            Reporter: Josh Elser
>            Assignee: Josh Elser
> When writing a custom iterator, many times the iterator has some semantic knowledge of
what each Key/Value being returned actually means (e.g. A word count could be returning Key/Value
but really is returning an Integer/Long count in the Value). This forces the client to know
what is going to be returned and handle the cast/transformation.
> I believe it should be fairly straightforward to encapsulate this transformation inside
the Accumulo client code. I plan on investigating the possibility of changing the ScannerBase
impl, or perhaps making a TypedScannerBase, in which the iterator at the "top" of the stack
for a scan can return something other than a Value to the client.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message