accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <>
Subject Re: Compaction slowing queries
Date Thu, 11 Sep 2014 16:37:00 GMT

Here are a few suggestions:

1. Reduce the number of concurrent compaction threads
(tserver.compaction.major.concurrent.max, and
tserver.compaction.minor.concurrent.max). You probably want to lean
towards twice as many major compaction threads as minor, but that
somewhat depends on how bursty your ingest rate is. The total number
of threads should leave plenty of cores for query processing.

2. Look into using a different compression codec. Snappy or LZz4 can
support a much higher throughput that the default of gzip, although
the compression ratio will not be as good.

3. Consider a key choice that limits the number of actively ingesting
tablets. Writing across all ~100k tablets means they will all be
actively compacting, but if you can arrange your keys such that only
~1k tablets are being actively written to then you can significantly
cut your expected write amplification (i.e. number of major
compactions needed). This is because minor compactions will be larger
and you'll spend proportionally more time writing into smaller


On Thu, Sep 11, 2014 at 12:06 PM, pdread <> wrote:
> We have 100+ tablet servers, approx 860 tablets/server, ingest approx 300K+
> docs/day, the problem recently started that queries during a minor or major
> compaction are taking about 100+ seconds as opposed to about 2 seconds when
> no compaction. Everyone on the cluster is effected, mapreduce jobs and batch
> scanners.
> One table has as many as 65K tablets.
> In the hopes of reducing the compactions yesterday we changed on 2 tables
> that appeared to cause most of the compactions:
> compaction.ratio from 3 to 5
> table.file.max from 15 to 45
> split.threshold from 725M to 2G.
> tservers are set to 3G, top shows 6G res and 7G virt for the one I checked.
> The odd things is we expected the number of tablets to change and they did
> not. The only thing that happened was the number of compactions went up but
> the duration of the compactions went down by about half. Queries in off
> times did not seem to change.
> One more thing, we only store docs < 64M in accumulo, otherwise they are
> written directly to hdfs.
> The question would be, is there a way to reduce the compaction frequency and
> or duration?
> Thanks in advance.
> Paul
> --
> View this message in context:
> Sent from the Users mailing list archive at

View raw message