accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Hugo <m...@piragua.com>
Subject Re: Advice on increasing ingest rate
Date Wed, 09 Apr 2014 20:46:05 GMT
Currently using the default AccumuloOutputFormat settings which I believe
is 2 threads


On Tue, Apr 8, 2014 at 5:36 PM, <dlmarion@comcast.net> wrote:

>
>
> How many threads are you using in the AccumuloOutputFormat? What is your
> latency set to?
>
>
>
> *From:* Adam Fuchs [mailto:afuchs@apache.org]
> *Sent:* Tuesday, April 08, 2014 5:36 PM
> *To:* user@accumulo.apache.org
> *Subject:* Re: Advice on increasing ingest rate
>
>
>
> MIke,
>
>
>
> What version of Accumulo are you using, how many tablets do you have, and
> how many threads are you using for minor and major compaction pools? Also,
> how big are the keys and values that you are using?
>
>
>
> Here are a few settings that may help you:
>
> 1. WAL replication factor (tserver.wal.replication). This defaults to 3
> replicas (the HDFS default), but if you set it to 2 it will give you a
> performance boost without a huge hit to reliability.
>
> 2. Ingest buffer size (tserver.memory.maps.max), also known as the
> in-memory map size. Increasing this generally improves the efficiency of
> minor compactions and reduces the number of major compactions that will be
> required down the line. 4-8 GB is not unreasonable.
>
> 3. Make sure your WAL settings are such that the size of a log
> (tserver.walog.max.size) multiplied by the number of active logs
> (table.compaction.minor.logs.threshold) is greater than the in-memory map
> size. You probably want to accomplish this by bumping up the number of
> active logs.
>
> 4. Increase the buffer size on the BatchWriter that the clients use. This
> can be done with the setBatchWriterOptions method on the
> AccumuloOutputFormat.
>
>
>
> Cheers,
>
> Adam
>
>
>
>
>
> On Tue, Apr 8, 2014 at 4:47 PM, Mike Hugo <mike@piragua.com> wrote:
>
> Hello,
>
>
>
> We have an ingest process that operates via Map Reduce, processing a large
> set of XML files and  inserting mutations based on that data into a set of
> tables.
>
>
>
> On a 5 node cluster (each node has 64G ram, 20 cores, and ~600GB SSD) I
> get 400k inserts per second with 20 mapper tasks running concurrently.
>  Increasing the number of concurrent mapper tasks to 40 doesn't have any
> effect (besides causing a little more backup in compactions).
>
>
>
> I've increased the table.compaction.major.ratio and increased the number
> of concurrent allowed compactions for both min and max compaction but each
> of those only had negligible impact on ingest rates.
>
>
>
> Any advice on other settings I can tweak to get things to move more
> quickly?  Or is 400k/second a reasonable ingest rate?  Are we at a point
> where we should consider generating r files like the bulk ingest example?
>
>
>
> Thanks in advance for any advice.
>
>
>
> Mike
>
>
>

Mime
View raw message