accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <>
Subject Re: Advice on increasing ingest rate
Date Tue, 08 Apr 2014 22:35:11 GMT
20 cores and just one SSD? Is there a standard recommendation for a core to
SSD ratio?

Other questions:

How are you sharding your data (i.e., what does your row look like)?
Are you pre-spliting the table?
How many tablets are ingesting at the same time?
Are you writing from the map-reduce directly to Accumulo or writing to
rFiles first?
Are the Accumulo nodes and the Hadoop nodes on the same servers?
Do you see the server load spike during ingest?
How much memory are you allocating to the tservers?
How large are the entries on average?
What are the largest entries?
Does the data skew towards large entries?
Are you querying at the same time as ingesting?

On Tue, Apr 8, 2014 at 5:35 PM, Adam Fuchs <> wrote:

> MIke,
> What version of Accumulo are you using, how many tablets do you have, and
> how many threads are you using for minor and major compaction pools? Also,
> how big are the keys and values that you are using?
> Here are a few settings that may help you:
> 1. WAL replication factor (tserver.wal.replication). This defaults to 3
> replicas (the HDFS default), but if you set it to 2 it will give you a
> performance boost without a huge hit to reliability.
> 2. Ingest buffer size (tserver.memory.maps.max), also known as the
> in-memory map size. Increasing this generally improves the efficiency of
> minor compactions and reduces the number of major compactions that will be
> required down the line. 4-8 GB is not unreasonable.
> 3. Make sure your WAL settings are such that the size of a log
> (tserver.walog.max.size) multiplied by the number of active logs
> (table.compaction.minor.logs.threshold) is greater than the in-memory map
> size. You probably want to accomplish this by bumping up the number of
> active logs.
> 4. Increase the buffer size on the BatchWriter that the clients use. This
> can be done with the setBatchWriterOptions method on the
> AccumuloOutputFormat.
> Cheers,
> Adam
> On Tue, Apr 8, 2014 at 4:47 PM, Mike Hugo <> wrote:
>> Hello,
>> We have an ingest process that operates via Map Reduce, processing a
>> large set of XML files and  inserting mutations based on that data into a
>> set of tables.
>> On a 5 node cluster (each node has 64G ram, 20 cores, and ~600GB SSD) I
>> get 400k inserts per second with 20 mapper tasks running concurrently.
>>  Increasing the number of concurrent mapper tasks to 40 doesn't have any
>> effect (besides causing a little more backup in compactions).
>> I've increased the table.compaction.major.ratio and increased the number
>> of concurrent allowed compactions for both min and max compaction but each
>> of those only had negligible impact on ingest rates.
>> Any advice on other settings I can tweak to get things to move more
>> quickly?  Or is 400k/second a reasonable ingest rate?  Are we at a point
>> where we should consider generating r files like the bulk ingest example?
>> Thanks in advance for any advice.
>> Mike

View raw message