accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <>
Subject Re: bulk load architecture
Date Thu, 18 Aug 2016 17:39:57 GMT
Bumping this thread up, because I'm also curious if anybody has any
thoughts on Adam's questions.

On Mon, Aug 15, 2016 at 1:49 PM Adam Fuchs <> wrote:

> I've been looking through the bulk load code lately related to some
> performance issues a customer of ours is experiencing, and I'm perplexed by
> a couple of things. Between o.a.a.master.tableOps.LoadFiles and
> o.a.a.server.client.BulkImporter we have 4 thread pools that are used in
> bulk load. It seems like only the master thread pool gets any parallelism
> because we always send one file at a time to the tservers (LoadFiles:154).
> Are the three thread pools in the tserver vestigial? Did we used to send
> bigger batches to the tservers and find that one at a time was more
> optimal?
> Seems like we could greatly simplify the tserver portion of the bulk load.
> Can anybody think of why that might not be a good idea?
> Also, has anybody optimized the pool sizes for multiple concurrent large
> bulk loads, and do you have suggestions on what settings to use (i.e.
> master.fate.threadpool.size and master.bulk.threadpool.size)?
> Thanks,
> Adam

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message