accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-4586) Make rowiterator fail when unsorted data is observed
Date Mon, 13 Feb 2017 19:05:41 GMT


Christopher Tubbs commented on ACCUMULO-4586:

Currently, {{RowIterator}} provides grouping in the same way that Linux command {{uniq}} provides
grouping. That is to say, it only groups adjacent items which match, rather than group all
matching items in the stream. I think that, just like for {{uniq}}, partial groupings on unsorted
data is still a valid use case.

The proposed "fix" removes a perfectly valid use of {{RowIterator}}, changing the behavior
to favor a different use case. This is exactly the kind of thing we complain about Thrift
doing, as in THRIFT-1805. I don't consider the current behavior to be a bug, but I do recognize
that it is prone to being used incorrectly, or with invalid assumptions.

Rather than change the behavior of the existing API to accommodate a subset of use cases,
I would prefer a new, alternate API to replace it, which imposes the desired restrictions.

> Make rowiterator fail when unsorted data is observed
> ----------------------------------------------------
>                 Key: ACCUMULO-4586
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.6.6, 1.7.1, 1.8.0
>            Reporter: Keith Turner
>             Fix For: 2.0.0
> A batchscanner was used as a row iterator data source.  The rowiterator expects data
in sorted order and the batch scanner does not supply data in sorted order.  The row iterator
should have a sanity check to ensure source data is in sorted order.

This message was sent by Atlassian JIRA

View raw message