accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Drob <mad...@cloudera.com>
Subject Re: CMS diff: Apache Accumulo Glossary
Date Tue, 22 Apr 2014 13:58:47 GMT
Could mention that loggers only exist as a separate process in 1.4 line and
older, or that the functionality was subsumed by the tablet server and
datanode with HDFS writes in later versions.


On Tue, Apr 22, 2014 at 9:53 AM, alexm@clouderagovt.com <
anonymous@apache.org> wrote:

> Clone URL (Committers only):
>
> https://cms.apache.org/redirect?new=anonymous;action=diff;uri=http://accumulo.apache.org/glossary.mdtext
>
> alexm@clouderagovt.com
>
> Index: trunk/content/glossary.mdtext
> ===================================================================
> --- trunk/content/glossary.mdtext       (revision 1589009)
> +++ trunk/content/glossary.mdtext       (working copy)
> @@ -25,15 +25,15 @@
>  - **iterator** - a mechanism for modifying tablet-local portions of the
> key/value space. Iterators are used for standard administrative tasks as
> well as for custom processing.
>  - **iterator priority** - an iterator must be configured with a
> particular scope and priority.  When a tablet server enters that scope, it
> will instantiate iterators in priority order starting from the smallest
> priority and ending with the largest, and apply each to the data read
> before rewriting the data or sending the data to the user.
>  - **iterator scopes** - the possible scopes for iterators are where the
> tablet server is already reading and/or writing data: minor compaction /
> flush time (*minc* scope), major compaction / file merging time (*majc*
> scope), and query time (*scan* scope)
> -- **gc** -
> +- **gc** - process that identifies temporary files that are no longer
> needed by any process, and deletes them.
>  - **key** - the key into the distributed sorted map which is accumulo.
>  The key is subdivided into row, column, and timestamp.  The column is
> further divided into  family, qualifier, and visibility.
>  - **locality group** - a set of column families that will be grouped
> together on disk.  With no locality groups configured, data is stored on
> disk in row order.  If each column family were configured to be its own
> locality group, the data for each column would be stored separately, in row
> order.  Configuring sets of columns into locality groups is a compromise
> between the two approaches and will improve performance when multiple
> columns are accessed in the same scan.
>  - **log-structured merge-tree** - the sorting / flushing / merging scheme
> on which BigTable's design is based.
> -- **logger** -
> +- **logger** - process that accepts updates to tablet servers and writes
> them to local on-disk storage for redundancy.
>  - **major compaction** - merging multiple files into a single file.  If
> all of a tablet's files are merged into a single file, it is called a *full
> major compaction*.
> -- **master** -
> +- **master** - process that detects and responds to tablet failures,
> balances load across tablet servers by assigning and migrating tablets when
> required, coordinates table operations, and handles tablet server logistics
> (startup, shutdown, recovery).
>  - **minor compaction** - flushing data from memory to disk.  Usually this
> creates a new file for a tablet, but if the memory flushed is merge-sorted
> in with data from an existing file (replacing that file), it is called a
> *merging minor compaction*.
> -- **monitor** -
> +- **monitor** - process that displays status and usage information for
> all Accumulo components.
>  - **permissions** - administrative abilities that must be given to a user
> such as creating tables or users and changing permissions or configuration
> parameters.
>  - **row** - the portion of the key that is controls atomicity.  Keys with
> the same row are guaranteed to remain on a single tablet hosted by a single
> tablet server, therefore multiple key/value pairs can be added to or
> removed from a row at the same time. The row is used for the primary
> sorting of the key.
>  - **scan** - reading a range of key/value pairs.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message