accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Busbey (JIRA)" <j...@apache.org>
Subject [jira] [Created] (ACCUMULO-3248) Document in memory map sizing guidelines
Date Tue, 21 Oct 2014 19:58:33 GMT
Sean Busbey created ACCUMULO-3248:
-------------------------------------

             Summary: Document in memory map sizing guidelines
                 Key: ACCUMULO-3248
                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3248
             Project: Accumulo
          Issue Type: Improvement
          Components: docs
            Reporter: Sean Busbey
             Fix For: 1.7.0


>From [~ecn]'s comments on ACCUMULO-3246

{quote}
A bigger IMM will still be used. It just doesn't help for long-running ingest (which is the
world I live in).
Let's say you have 10G to ingest, 1G / unit time, and a 1G IMM.
At .5 G, the IMM starts minor compacting. It can write out that .5G at about the same speed
as the WAL can accept the next .5G.
So, by the time the first .5G is done writing, we can start writing the next .5G.
Doubling the IMM just moves the bar from .5G chunks to 1G chunks. Both of these are large
enough to take advantage of compression and write buffer sizes.
You can argue that you will do fewer major compactions, and that's true. But these also occur
in the background, and don't affect query/ingest except that they consume resources, create
disk contention and invalidate blocks/buffers. Bigger flushes will require longer major compactions
when they finally happen, so there's no win.
So, the IMM for each actively ingesting tablet should be ~ HDFS block size. More IMM will
be used, and will give you some big numbers on initial ingest, but sustained ingest will not
improve.
Because aggregation/combiners run only at compaction time, a larger IMM may actually hurt
performance.
{quote}

We should roll these into the ref guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message