accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Kepner <kep...@ll.mit.edu>
Subject Re: Improving ingest performance [SEC=UNCLASSIFIED]
Date Wed, 24 Jul 2013 14:35:15 GMT
(5,000,000,000 records) x (~10 entries/record) /
((12 nodes) x (70 minutes) x (60 seconds/minute))

= ~100,000 entries/sec/node

This is consistent with other published results

On Wed, Jul 24, 2013 at 02:26:18AM -0400, Dickson, Matt MR wrote:
>    UNCLASSIFIED
> 
>    Hi,
> 
>    I'm trying to improve ingest performance on a 12 node test cluster.
>    Currently I'm loading 5 billion records in approximately 70 minutes which
>    seems excessive.  Monitoring the job there are 2600 map jobs (there is no
>    reduce stage, just the mapper) with 288 running at any one time.  The
>    performance seems slowest in the early stages of the job prior to to min
>    or maj compactions occuring.  Each server has 48 GB memory and currently
>    the accumulo settings are based on the 3GB settings in the example config
>    directory, ie tserver.memory.maps.max = 1GB, tserver.cache.index.site=50M
>    and tserver.cache.index.site=512M.  All other settings on the table are
>    default.
> 
>    Questions.
> 
>    1. What is Accumulo doing in the initial stage of a load and which
>    configurations should I focus on to improve this?
>    2. At what ingest rate should I consider using the bulk ingest process
>    with rfiles?
> 
>    Thanks
>    Matt
> 
>    IMPORTANT: This email remains the property of the Department of Defence
>    and is subject to the jurisdiction of section 70 of the Crimes Act 1914.
>    If you have received this email in error, you are requested to contact the
>    sender and delete the email.

Mime
View raw message