accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Fuchs (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-519) support in-memory compactions
Date Tue, 10 Apr 2012 16:09:19 GMT
support in-memory compactions
-----------------------------

                 Key: ACCUMULO-519
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-519
             Project: Accumulo
          Issue Type: Improvement
          Components: tserver
            Reporter: Adam Fuchs
            Assignee: Adam Fuchs


There are several factors that influence how big to make the in-memory write buffer (tserver.memory.maps.max)
for Accumulo. Two dominant factors that conflict with each other are:
# Overall disk I/O depends somewhat on the log of the ratio of tablet size to initial file
size. Bigger write buffer leads to bigger initial files, and can lead to less overall disk
I/O.
# Aggregation, versioning, and deleting take place in the iterator tree, which only applies
during compactions and scans. The in-memory write buffer can buffer many versions of a given
key, and scans can be slow if compactions are infrequent.

One solution would be to run some sort of stepped compaction in-memory, in which the iterator
tree is applied in some sort of log-structured fashion. We can consider the minor compaction
to be two pipelined steps: serialization of map entries, and writing the serialized form to
disk. After we have written the serialized form to disk, we can free up the write-ahead logs
associated with that data.

I propose the following:
# We should buffer the serialized RFile form in-memory instead of writing it to disk (call
it a micro-compaction).
# We should implement a merging step for merging existing buffered RFiles with newly serialized
buffers, using the same algorithm that we use for major compaction file selection.
# The in-memory buffer should be micro-compacted aggressively (whenever we have a thread free,
with some minimum allocation of CPU and memory I/O resources to this task).
# The current triggers that we use for minor compactions should be used to select buffered
RFiles from memory and dump them to disk, at which point we can drop the write-ahead log references.

Overall this will allow users to keep the initial files generated by minor compactions large
while alleviating the second concern of buffering too many versions of the same key. Two use
cases that will benefit greatly for this are ACCUMULO-348 (lots of updates to the default
tablet info in the !METADATA table), and aggregation in which there are a small number of
keys. Other considerations that also affect this space are:
# RFiles are column-oriented (with locality groups), while the in-memory map is only row oriented.
Moving to a column-oriented structure sooner would benefit some queries.
# RFiles are optimized for sequential access while the in-memory write buffer requires lots
of random memory access to read a stream of key/value pairs in key order.
# RFiles use configurable compression, while the in-memory map only uses hierarchical organization.
RFiles generally get better compression.
# Currently, writing a column-oriented RFile requires scanning the entire in-memory map for
each locality group. Bigger in-memory maps can take a long time to re-order for minor compaction.
# Memory fragmentation and garbage collection in the JVM are big concerns that a lot of work
has gone into. We need to be considerate of those factors in implementing this change.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message