accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <>
Subject Re: stupid/dangerous batch load question
Date Wed, 28 May 2014 18:16:39 GMT
Lots of questions can be asked:

How many servers?
How many compactions are being run at once?
What is the size of the mutations?

What does the Accumulo monitor page say during the ingest process? Does it
indicate high load?

Are you running map-reduce jobs at the same time as the bulk ingest?

I think there is a setting to change the number of threads used by bulk
ingest. Can you run 'config -t' and post the results?

I've used tables with thousands of tablets, I can't remember having to wait
for a Bulk Ingest to process.

On Wed, May 28, 2014 at 1:49 PM, Seidl, Ed <> wrote:

>  I have a large amount of data that I am batch loading into accumulo.
>  I'm using mapreduce to read in chunks of data and write out rfiles to be
> loaded with importdirectory.  I've noticed that the import will hang for
> longer and longer times as more data is added.  For instance, one table,
> which currently has ~2500 tablets, now takes around 2 hours to process the
> importdirectory.
>  In poking around in the source for TableOperationsImpl (1.5.0), I see
> that there is an option to not wait on certain operations (like compact).
>  Would it be dangerous to (optionally) return immediately from
> importdirectory, and instead check the fail directory to detect errors in
> the import?  I know this will eventually cause a backup in the staging
> directories, but is there any potential to corrupt the tables?
>  Thanks,
> Ed

View raw message