accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Hugo <m...@piragua.com>
Subject Re: Advice on increasing ingest rate
Date Wed, 09 Apr 2014 20:41:37 GMT
On Tue, Apr 8, 2014 at 4:35 PM, Adam Fuchs <afuchs@apache.org> wrote:

> MIke,
>
> What version of Accumulo are you using, how many tablets do you have, and
> how many threads are you using for minor and major compaction pools? Also,
> how big are the keys and values that you are using?
>
>
1.4.5
6 threads each for min and major compaction
Keys and values are not that large, there may be a few outliers but I would
estimate that most of them are < 1k



> Here are a few settings that may help you:
> 1. WAL replication factor (tserver.wal.replication). This defaults to 3
> replicas (the HDFS default), but if you set it to 2 it will give you a
> performance boost without a huge hit to reliability.
> 2. Ingest buffer size (tserver.memory.maps.max), also known as the
> in-memory map size. Increasing this generally improves the efficiency of
> minor compactions and reduces the number of major compactions that will be
> required down the line. 4-8 GB is not unreasonable.
> 3. Make sure your WAL settings are such that the size of a log
> (tserver.walog.max.size) multiplied by the number of active logs
> (table.compaction.minor.logs.threshold) is greater than the in-memory map
> size. You probably want to accomplish this by bumping up the number of
> active logs.
> 4. Increase the buffer size on the BatchWriter that the clients use. This
> can be done with the setBatchWriterOptions method on the
> AccumuloOutputFormat.
>
>
Thanks for the tips, I try these out


> Cheers,
> Adam
>
>
>
> On Tue, Apr 8, 2014 at 4:47 PM, Mike Hugo <mike@piragua.com> wrote:
>
>> Hello,
>>
>> We have an ingest process that operates via Map Reduce, processing a
>> large set of XML files and  inserting mutations based on that data into a
>> set of tables.
>>
>> On a 5 node cluster (each node has 64G ram, 20 cores, and ~600GB SSD) I
>> get 400k inserts per second with 20 mapper tasks running concurrently.
>>  Increasing the number of concurrent mapper tasks to 40 doesn't have any
>> effect (besides causing a little more backup in compactions).
>>
>> I've increased the table.compaction.major.ratio and increased the number
>> of concurrent allowed compactions for both min and max compaction but each
>> of those only had negligible impact on ingest rates.
>>
>> Any advice on other settings I can tweak to get things to move more
>> quickly?  Or is 400k/second a reasonable ingest rate?  Are we at a point
>> where we should consider generating r files like the bulk ingest example?
>>
>> Thanks in advance for any advice.
>>
>> Mike
>>
>
>

Mime
View raw message