accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dylan Hutchison (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-3751) Iterator Redesign
Date Sun, 26 Apr 2015 04:24:38 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512865#comment-14512865
] 

Dylan Hutchison commented on ACCUMULO-3751:
-------------------------------------------

Thanks [~medined], you pointed me to an idea.  Column families are distinguished because they
can be used for locality groups.  Beyond that, the other fields are effectively laid out next
to each other.  Why distinguish between the parts of a Key that are considered indexable,
versus the Value that is considered non-indexable?

If we laid out in a single byte[] the {{row-colF-colQ-colVis-timestamp-delete-value}}, then
we can sort on the value just as if we had a comparison by {{PartialKey.ROW_COLFAM_COLQUAL_COLVIS_TIME_DEL_VAL}}.
 The use of a column qualifiter is then a place to put information that sorts before column
visibility and timestamp are considered.  There's nothing preventing us from putting the source/parent
field you suggest in the column qualifier position, value position or some other position
(as long as we delimit the "fields" in the byte[] or track positions).

When I describe Accumulo to friends, I tell them that "everything is a byte[]" that can be
interpreted however you choose. This sounds like a greater realization of that motto.

As for geographic data, the best data layout schemes for those are use things like Z-order
curves and Hilbert curves, which seem hard to do in the current Accumulo.  Were you thinking
of something that would help with these, or perhaps something more elaborate?

> Iterator Redesign
> -----------------
>
>                 Key: ACCUMULO-3751
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3751
>             Project: Accumulo
>          Issue Type: Wish
>          Components: tserver
>            Reporter: Dylan Hutchison
>             Fix For: 2.0.0
>
>
> Many Accumulo users have pointed out issues and places for improvement in the iterator
stack formed atop the [SortedKeyValueIterator|https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/SortedKeyValueIterator.html]
interface. This issue aims to gather thoughts and requirements on what would make a new iterator
stack, ideally reverse compatible with the current stack.
> h3. {{close}} method for iterators
> See ACCUMULO-1280. Iterators do not have full lifecycle control, since a tablet server
may "tear down" an iterator after it returns from a {{seek}} or {{next}} call. Iterators that
start other threads, access external resources or perform some other action requiring cleanup
must either initialize and tear down those actions within a call to {{seek}} or {{next}},
which is usually prohibitively expensive, or they keep state anyway and "hope for the best,"
possibly by putting cleanup code in the {{finalize}} method, which is not guaranteed to be
called by the JVM but is a better option than nothing.
> Current advice to iterator writers is to "not do" these kinds of operations that require
stateful cleanup.  Adding a {{close}}-like method that the tablet server guarantees will be
called (via try-finally) before an iterator is torn down for any reason would make these iterators
much more stable and easier to write.
> [~billie.rinaldi] has suggested using the [Closeable|https://docs.oracle.com/javase/7/docs/api/java/io/Closeable.html]
or [AutoCloseable|https://docs.oracle.com/javase/7/docs/api/java/lang/AutoCloseable.html]
interface.  The tablet server could call {{close()}} on any SKVI that also implements AutoCloseable.
> It would also be nice for the iterator to know _why_ it is being closed, e.g., because
> * the scan/compact range on the current tablet finished
> * the source is switching
> * the scan batch finished, and we're waiting for the client to request more batches
> * some interrupt occurred (?)
> * to give CPU time or memory to other iterators for fairness (?)
> Such a reason could be passed to the iterator in the same way that the tablet server
has a [MajorCompactionReason|https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/tserver/compaction/MajorCompactionReason.html].
> h3. Iterator Performance
> Some have noticed that Accumulo is CPU-bound because it bottlenecks on the large numbers
of serialize/deserialize operations, object creations and data copying present in the iterator
stack. [~afuchs] made a bunch of changes to system iterators that increased performance significantly
in ACCUMULO-3079.
> We may want to consider more fundamental changes, like putting the row, column family,
qualifier, visibility, delete marker and timestamp in a single byte[] with a field delimitter
byteacter, rather than keeping them in separate byte[]s. This gives an added bonus of easy
key comparisons. Why not also store the Value with the Key rather than split the two into
separate objects?  Imagine a {{getTopEntry}} operation that returns a byte[] that holds all
the components of the Key and Value.
> We should adopt a philosophy of "reuse/alias byte[] buffers as often as possible," copying
only when we need to save a copy. Imagine one extreme where we pass a single byte[] down the
iterator stack rather than a Key or Value wrapping scattered buffers. If we were to consider
changes along this route, we ought to create features that make it easy to grab and manipulate
data in the byte[] as easily as the Key and Value objects, perhaps through well-documented
static methods. This is critical for usability since users are used to object-oriented style
manipulations of Key and Value.
> For reverse compatibility, create an interface that extends SKVI and has methods for
passing byte[] references directly, converting the byte[] to old Key/Value objects for iterators
that do not implement the interface extension.
> h3. State-save/restore for iterators
> The current information an iterator has to build up the state it needs are (1) the {{Map<String,String>}}
options passed to init, (2) the [IteratorEnvironment|https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/IteratorEnvironment.html]
passed to init, and (3) the range passed to seek. When an iterator is torn down, the seek
call after it is next reconstructed has a start key equal to the last key returned (non-inclusive).

> There are a few ways we could imagine letting iterators save their state. 
> # We could allow an iterator to add to or even modify the {{Map<String,String>}}
options passed to init. This should be sufficient for most iterators to save their state and
reload.
> # ACCUMULO-625 proposes a kind of "state cookie" that is emitted from an iterator (maybe
as the return value of a {{close}} method) and is sent back to the client, so that a client
could re-start a scan with this cookie and re-create the iterator in that state at the tablet
server. This seems more powerful but perhaps harder engineering than #1.
> h3. Iterator safety
> [~jstoneham] put forward the idea of encapsulating user iterators in a security manager
in ACCUMULO-1188. 
> [~kturner] had an idea for running iterators in separate processes, and then suggested
using tablet server rolling restarts to handle failing iterators.
> [~elserj] thought about giving long-running iterators the ability to stop their processing
when their scan thread is interrupted but before the iterator returns in ACCUMULO-3348. Similar
to the AutoCloseable suggestion above, we may realize this by checking whether iterators implement
[InterruptibleIterator|https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/system/InterruptibleIterator.html]
and calling their {{setInterruptFlag}} method when they need interruption.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message