accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Hugo <m...@piragua.com>
Subject Re: Advice on increasing ingest rate
Date Thu, 10 Apr 2014 14:27:39 GMT
I took a screen shot of the monitor page and attached it.  Currently we're
seeing ingest of about 350k entries per second and 60-90 MB/s


On Wed, Apr 9, 2014 at 5:33 PM, Adam Fuchs <afuchs@apache.org> wrote:

> If the average is around 1k per k/v entry, then I would say that 400MB/s
> is very good performance for incremental/streaming ingest into Accumulo on
> that cluster. However, I suspect that your entries are probably not that
> big on average. Do you have a measurement for MB/s ingest?
>
> Adam
> On Apr 9, 2014 4:42 PM, "Mike Hugo" <mike@piragua.com> wrote:
>
>>
>>
>>
>> On Tue, Apr 8, 2014 at 4:35 PM, Adam Fuchs <afuchs@apache.org> wrote:
>>
>>> MIke,
>>>
>>> What version of Accumulo are you using, how many tablets do you have,
>>> and how many threads are you using for minor and major compaction pools?
>>> Also, how big are the keys and values that you are using?
>>>
>>>
>> 1.4.5
>> 6 threads each for min and major compaction
>> Keys and values are not that large, there may be a few outliers but I
>> would estimate that most of them are < 1k
>>
>>
>>
>>> Here are a few settings that may help you:
>>> 1. WAL replication factor (tserver.wal.replication). This defaults to 3
>>> replicas (the HDFS default), but if you set it to 2 it will give you a
>>> performance boost without a huge hit to reliability.
>>> 2. Ingest buffer size (tserver.memory.maps.max), also known as the
>>> in-memory map size. Increasing this generally improves the efficiency of
>>> minor compactions and reduces the number of major compactions that will be
>>> required down the line. 4-8 GB is not unreasonable.
>>> 3. Make sure your WAL settings are such that the size of a log
>>> (tserver.walog.max.size) multiplied by the number of active logs
>>> (table.compaction.minor.logs.threshold) is greater than the in-memory map
>>> size. You probably want to accomplish this by bumping up the number of
>>> active logs.
>>> 4. Increase the buffer size on the BatchWriter that the clients use.
>>> This can be done with the setBatchWriterOptions method on the
>>> AccumuloOutputFormat.
>>>
>>>
>> Thanks for the tips, I try these out
>>
>>
>>> Cheers,
>>> Adam
>>>
>>>
>>>
>>> On Tue, Apr 8, 2014 at 4:47 PM, Mike Hugo <mike@piragua.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> We have an ingest process that operates via Map Reduce, processing a
>>>> large set of XML files and  inserting mutations based on that data into a
>>>> set of tables.
>>>>
>>>> On a 5 node cluster (each node has 64G ram, 20 cores, and ~600GB SSD) I
>>>> get 400k inserts per second with 20 mapper tasks running concurrently.
>>>>  Increasing the number of concurrent mapper tasks to 40 doesn't have any
>>>> effect (besides causing a little more backup in compactions).
>>>>
>>>> I've increased the table.compaction.major.ratio and increased the
>>>> number of concurrent allowed compactions for both min and max compaction
>>>> but each of those only had negligible impact on ingest rates.
>>>>
>>>> Any advice on other settings I can tweak to get things to move more
>>>> quickly?  Or is 400k/second a reasonable ingest rate?  Are we at a point
>>>> where we should consider generating r files like the bulk ingest example?
>>>>
>>>> Thanks in advance for any advice.
>>>>
>>>> Mike
>>>>
>>>
>>>
>>

Mime
View raw message