accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Punnoose <rosh...@gmail.com>
Subject Bulk Ingest
Date Fri, 17 Jun 2016 02:08:07 GMT
We are trying to perform bulk ingest at scale and wanted to get some quick
thoughts on how to increase performance and stability. One of the problems
we have is that we sometimes import thousands of small files, and I don't
believe there is a good way around this in the architecture as of yet.
Already I have run into an rpc timeout issue because the import process is
taking longer than 5m. And another issue where we have so many files after
a bulk import that we have had to bump the tserver.scan.files.open.max to
1K.

Here are some other configs that we have been toying with:
- master.fate.threadpool.size: 20
- master.bulk.threadpool.size: 20
- master.bulk.timeout: 20m
- tserver.bulk.process.threads: 20
- tserver.bulk.assign.threads: 20
- tserver.bulk.timeout: 20m
- tserver.compaction.major.concurrent.max: 20
- tserver.scan.files.open.max: 1200
- tserver.server.threads.minimum: 64
- table.file.max: 64
- table.compaction.major.ratio: 20

(HDFS)
- dfs.namenode.handler.count: 100
- dfs.datanode.handler.count: 50

Just want to get any quick ideas for performing bulk ingest at scale.
Thanks guys

p.s. This is on Accumulo 1.6.5

Mime
View raw message