accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Cordova <aa...@cordovas.org>
Subject Re: Increasing Ingest Rate
Date Thu, 04 Apr 2013 21:22:33 GMT
How many clients are you using to write?

Also the BatchWriter parameters might have an effect too - typically people use values like
the following:

	BatchWriter writer = connector.createBatchWriter(tableName, 1000000, 1000, 10)

Those numbers are 

	1000000 : max bytes per batch
	1000 : max latency in milliseconds
	10 : threads to use

What's the max ingest rate of a single server?


On Apr 4, 2013, at 3:26 PM, Jimmy Lin <jimmys.email@gmail.com> wrote:

> 
> 
> On Thu, Apr 4, 2013 at 2:25 PM, Eric Newton <eric.newton@gmail.com> wrote:
> Have you pre-split your tablet to spread the load out to all the machines? 
> Yes.  We are using splits from loading the whole dataset previously.
> Does the data distribution match your splits?
> Yes.  See above.
> Is the ingest data already sorted (that is, it always writes to the last tablet)?
> No.  The data writes to multiple tablets concurrently.  We set up a queue parameter and
divide the data into multiple queues.
> How much memory and how many threads are you using in your batchwriters?
> I believe we have 16GB of memory for the Java writer with 18 threads running per server.
> 
> Check the ingest rates on tablet server monitor page and look for hot spots.
> There are certain servers that have higher ingest rates, and the server that is busiest
changes over time, but the overall ingestion rate will not go up.
>  
>  
> 
> 
> On Thu, Apr 4, 2013 at 2:01 PM, Jimmy Lin <jimmys.email@gmail.com> wrote:
> Hello,
> I am fairly new to Accumulo and am trying to figure out what is preventing my system
from ingesting data at a faster rate. We have 15 nodes running a simple Java program that
reads and writes to Accumulo and then indexes some data into Solr. The rate of ingest is not
scaling linearly with the number of nodes that we start up. I have tried increasing several
parameters including:
> - limit of file descriptors in linux
> - max zookeeper connections
> - tserver.memory.maps.max
> - tserver_opts memory size
> - tserver.mutation_queue.max
> - tserver.scan.files.open.max
> - tserver.walog.max.size
> - tserver.cache.data.size
> - tserver.cache.index.size
> - hdfs setting for xceivers
> No matter what changes we make, we cannot get the ingest rate to go over 100k entries/s
and about 6 Mb/s. I know Accumulo should be able to ingest faster than this.
> Thanks in advance,
>  
> Jimmy Lin
>  
> 
> 


Mime
View raw message