accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Configuring batch writers
Date Tue, 19 Jul 2016 17:43:04 GMT
It's very dependent on the requirements of your application and the
amount of data your application is serving. A general recommendation
which should be universal is try to limit each server to hundreds of
tablets. This, like everything else, is also a loose recommendation.

Likely, this will require experimentation on your end. If you can
share more details about the specifics of your data set and
requirements, we might be able to give you some more direction.

On Tue, Jul 19, 2016 at 12:35 PM, Jamie Johnson <jej2003@gmail.com> wrote:
> Thank you, this was helpful.  What about the number of splits for a table.
> Is there a general rule of thumb for how many splits and what size they
> should be when trying to balance ingest/query performance?
>
> On Fri, Jul 15, 2016 at 2:38 PM, Emilio Lahr-Vivaz <elahrvivaz@ccri.com>
> wrote:
>>
>> Another thing to consider is how many tablet servers the mutations are
>> being sent to - if they're all going to a single split, that's going to
>> reduce your throughput a lot.
>>
>>
>> On 07/15/2016 02:33 PM, dlmarion@comcast.net wrote:
>>
>> The batch writer has several knobs (latency time, memory buffer, etc) that
>> you can tune to meet your requirements. The values for those settings will
>> depend on a lot of variables, to include:
>>
>>   - number of tablet servers
>>   - size of mutations
>>   - desired latency
>>   - memory buffer
>>   - configuration settings on the table(s) and tablet servers.
>>
>>  Suggest picking a starting point and see how it works for you, such as
>>
>>   threads - equal to the number of tablet servers (unless you have a
>> really large number of tablet servers)
>>   buffer - 100MB
>>   latency - 10 seconds
>>
>>  If you are hitting a wall with those settings, you could increase the
>> buffer and latency and/or change some settings on the server side that have
>> to do with the write ahead logs.
>>
>> ________________________________
>> From: "Jamie Johnson" <jej2003@gmail.com>
>> To: user@accumulo.apache.org
>> Sent: Friday, July 15, 2016 2:16:40 PM
>> Subject: Configuring batch writers
>>
>> Is there any documentation that outlines reasonable settings for batch
>> writers given a known ingest rate?  For instance if I have a source that is
>> producing in the neighborhood of 15MB of mutations per second, what would a
>> reasonable configuration for the batch writer be to handle an ingest at this
>> rate? What are reasonable rules of thumb to follow to ensure that the
>> writers don't block, etc?
>>
>>
>

Mime
View raw message