accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dickson, Matt MR" <matt.dick...@defence.gov.au>
Subject Improving ingest performance [SEC=UNCLASSIFIED]
Date Wed, 24 Jul 2013 06:26:18 GMT
UNCLASSIFIED

Hi,

I'm trying to improve ingest performance on a 12 node test cluster.  Currently I'm loading
5 billion records in approximately 70 minutes which seems excessive.  Monitoring the job there
are 2600 map jobs (there is no reduce stage, just the mapper) with 288 running at any one
time.  The performance seems slowest in the early stages of the job prior to to min or maj
compactions occuring.  Each server has 48 GB memory and currently the accumulo settings are
based on the 3GB settings in the example config directory, ie tserver.memory.maps.max = 1GB,
tserver.cache.index.site=50M and tserver.cache.index.site=512M.  All other settings on the
table are default.

Questions.

1. What is Accumulo doing in the initial stage of a load and which configurations should I
focus on to improve this?
2. At what ingest rate should I consider using the bulk ingest process with rfiles?

Thanks
Matt

IMPORTANT: This email remains the property of the Department of Defence and is subject to
the jurisdiction of section 70 of the Crimes Act 1914. If you have received this email in
error, you are requested to contact the sender and delete the email.

Mime
View raw message