accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Hugo <>
Subject Advice on increasing ingest rate
Date Tue, 08 Apr 2014 20:47:22 GMT

We have an ingest process that operates via Map Reduce, processing a large
set of XML files and  inserting mutations based on that data into a set of

On a 5 node cluster (each node has 64G ram, 20 cores, and ~600GB SSD) I get
400k inserts per second with 20 mapper tasks running concurrently.
 Increasing the number of concurrent mapper tasks to 40 doesn't have any
effect (besides causing a little more backup in compactions).

I've increased the table.compaction.major.ratio and increased the number of
concurrent allowed compactions for both min and max compaction but each of
those only had negligible impact on ingest rates.

Any advice on other settings I can tweak to get things to move more
quickly?  Or is 400k/second a reasonable ingest rate?  Are we at a point
where we should consider generating r files like the bulk ingest example?

Thanks in advance for any advice.


View raw message