accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (ACCUMULO-3248) Document in memory map sizing guidelines
Date Tue, 21 Oct 2014 20:04:36 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-3248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178959#comment-14178959
] 

Josh Elser edited comment on ACCUMULO-3248 at 10/21/14 8:04 PM:
----------------------------------------------------------------

There is already a comment on this in the user manual. We should expand [there|http://accumulo.apache.org/1.6/accumulo_user_manual.html#_tserver_memory_maps_max]
or just move the consideration to a brand new section and have the property description direct
the user there.


was (Author: elserj):
There is already a comment on this in the user manual. We should expand [http://accumulo.apache.org/1.6/accumulo_user_manual.html#_tserver_memory_maps_max|there]
or just move the consideration to a brand new section and have the property description direct
the user there.

> Document in memory map sizing guidelines
> ----------------------------------------
>
>                 Key: ACCUMULO-3248
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3248
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: docs
>            Reporter: Sean Busbey
>             Fix For: 1.7.0
>
>
> From [~ecn]'s comments on ACCUMULO-3246
> {quote}
> A bigger IMM will still be used. It just doesn't help for long-running ingest (which
is the world I live in).
> Let's say you have 10G to ingest, 1G / unit time, and a 1G IMM.
> At .5 G, the IMM starts minor compacting. It can write out that .5G at about the same
speed as the WAL can accept the next .5G.
> So, by the time the first .5G is done writing, we can start writing the next .5G.
> Doubling the IMM just moves the bar from .5G chunks to 1G chunks. Both of these are large
enough to take advantage of compression and write buffer sizes.
> You can argue that you will do fewer major compactions, and that's true. But these also
occur in the background, and don't affect query/ingest except that they consume resources,
create disk contention and invalidate blocks/buffers. Bigger flushes will require longer major
compactions when they finally happen, so there's no win.
> So, the IMM for each actively ingesting tablet should be ~ HDFS block size. More IMM
will be used, and will give you some big numbers on initial ingest, but sustained ingest will
not improve.
> Because aggregation/combiners run only at compaction time, a larger IMM may actually
hurt performance.
> {quote}
> We should roll these into the ref guide.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message