accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <afu...@apache.org>
Subject Re: Advice on increasing ingest rate
Date Wed, 09 Apr 2014 22:33:39 GMT
If the average is around 1k per k/v entry, then I would say that 400MB/s is
very good performance for incremental/streaming ingest into Accumulo on
that cluster. However, I suspect that your entries are probably not that
big on average. Do you have a measurement for MB/s ingest?

Adam
On Apr 9, 2014 4:42 PM, "Mike Hugo" <mike@piragua.com> wrote:

>
>
>
> On Tue, Apr 8, 2014 at 4:35 PM, Adam Fuchs <afuchs@apache.org> wrote:
>
>> MIke,
>>
>> What version of Accumulo are you using, how many tablets do you have, and
>> how many threads are you using for minor and major compaction pools? Also,
>> how big are the keys and values that you are using?
>>
>>
> 1.4.5
> 6 threads each for min and major compaction
> Keys and values are not that large, there may be a few outliers but I
> would estimate that most of them are < 1k
>
>
>
>> Here are a few settings that may help you:
>> 1. WAL replication factor (tserver.wal.replication). This defaults to 3
>> replicas (the HDFS default), but if you set it to 2 it will give you a
>> performance boost without a huge hit to reliability.
>> 2. Ingest buffer size (tserver.memory.maps.max), also known as the
>> in-memory map size. Increasing this generally improves the efficiency of
>> minor compactions and reduces the number of major compactions that will be
>> required down the line. 4-8 GB is not unreasonable.
>> 3. Make sure your WAL settings are such that the size of a log
>> (tserver.walog.max.size) multiplied by the number of active logs
>> (table.compaction.minor.logs.threshold) is greater than the in-memory map
>> size. You probably want to accomplish this by bumping up the number of
>> active logs.
>> 4. Increase the buffer size on the BatchWriter that the clients use. This
>> can be done with the setBatchWriterOptions method on the
>> AccumuloOutputFormat.
>>
>>
> Thanks for the tips, I try these out
>
>
>> Cheers,
>> Adam
>>
>>
>>
>> On Tue, Apr 8, 2014 at 4:47 PM, Mike Hugo <mike@piragua.com> wrote:
>>
>>> Hello,
>>>
>>> We have an ingest process that operates via Map Reduce, processing a
>>> large set of XML files and  inserting mutations based on that data into a
>>> set of tables.
>>>
>>> On a 5 node cluster (each node has 64G ram, 20 cores, and ~600GB SSD) I
>>> get 400k inserts per second with 20 mapper tasks running concurrently.
>>>  Increasing the number of concurrent mapper tasks to 40 doesn't have any
>>> effect (besides causing a little more backup in compactions).
>>>
>>> I've increased the table.compaction.major.ratio and increased the number
>>> of concurrent allowed compactions for both min and max compaction but each
>>> of those only had negligible impact on ingest rates.
>>>
>>> Any advice on other settings I can tweak to get things to move more
>>> quickly?  Or is 400k/second a reasonable ingest rate?  Are we at a point
>>> where we should consider generating r files like the bulk ingest example?
>>>
>>> Thanks in advance for any advice.
>>>
>>> Mike
>>>
>>
>>
>

Mime
View raw message