accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dhutchis <>
Subject [GitHub] accumulo issue #159: ACCUMULO-1280: many changes for closing iterators
Date Sat, 19 Nov 2016 03:23:47 GMT
Github user dhutchis commented on the issue:
    Bringing the discussion on exception handling back to the top level, and adding more thoughts.
    ## Exceptions and Closing
    Suppose you're an iterator.  You call `next()` or `seek()` on your source and it throws
an IOException.  I would treat this the same as if `hasTop()` returned false; no more data
is available from your source.  The iterator stack will not necessarily be torn down. The
bottom of the iterator stack might seek to the next range, for example (not sure if this actually
    Accumulo should *guarantee* that `close()` will be called before an iterator is torn down,
say via try-finally, except in extreme cases like OOM.  I see two approaches to doing this:
    1. Specify the contract that every iterator *must* `close()` its source.  This strategy
works if every iterator, system and user-defined, follows the contract.  Existing iterators
would need to be modified to do this, so it's a breaking change.
    2. Retain the list of `IterInfo`s from when Accumulo sets up the iterator stack (see `IteratorUtil.loadIterators()`).
Before Accumulo tears down the stack, for any reason, call `close()` on each iterator individually,
say starting from the root RFile/InMemoryMap iterators and working down (top-down vs. bottom-up
order may not matter). This strategy would not break existing iterators; an iterator that
does not call `close()` on its source is fine since Accumulo would call it. It also provides
better iterator isolation.
    ## Idempotent
    I think `close()` *must* be idempotent; calling `close()` more than once should have the
same effect as calling `close()` once. We should document the requirement. This would also
suggest changing `AutoCloseable` to `Closeable`.
    If we don't require idempotency, then option 2 above is not viable and we should carefully
engineer the implementation for *exactly-once* semantics. Seems difficult. Also, can anyone
think of a use case that requires exactly-once semantics?
    ## Less certain ideas
    Because discussion is good =)
    ### "Reason" for Closing
    It would be awesome if the iterator knew *why* Accumulo is closing it. This doesn't mesh
well with the `AutoCloseable` interface.  Otherwise I would suggest passing a `Reason` interface
parameter to the `close()` method.  Reasons include "the client stopped requesting tuples"
or "switching source" or "starting migration" or "tuple returned outside of seek fence for
this tablet" or "out of fairness to other concurrent scans" or "client cancelled scan". Something
    default void close() {
    default void close(Reason reason) {
      // default implementation, do nothing
    It's fine if we don't do the `Reason`. It would make try-with-resources awkward. Then
again, how often (if ever) would iterators be constructed in try-with-resources blocks?
    I'm also not sure what use cases need this. It seems helpful to provide the iterators
with this information, but maybe it's not necessary.
    ### Put `close()` in `finalize()` for safety
    Yes, we know the JVM does not guarantee `finalize()` would ever be called. But it would
be a nice "safety" against bugs in Accumulo that cause Accumulo to forgo calling `close()`
as it normally must. In
    protected void finalize() throws Throwable {
    Obviously we cannot do this if we don't require `close()` idempotency.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message