accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Bulk Ingest
Date Fri, 17 Jun 2016 02:24:25 GMT
There are two big things that are required to really scale up bulk 
loading. Sadly (I guess) they are both things you would need to be 
implement on your own:

1) Avoid lots of small files. Target as large of files as you can, 
relative to your ingest latency requirements and your max file size (set 
on your instance or table)

2) Avoid having to import one file to multiple tablets. Remember that 
the majority of the metadata update for Accumulo is updating the tablet 
row with the new file. When you have one file which spans many tablets, 
you are now create N metadata updates instead of just one. When you 
create the files, take into account the split points of your table, and 
use that try to target one file per tablet.

Roshan Punnoose wrote:
> We are trying to perform bulk ingest at scale and wanted to get some
> quick thoughts on how to increase performance and stability. One of the
> problems we have is that we sometimes import thousands of small files,
> and I don't believe there is a good way around this in the architecture
> as of yet. Already I have run into an rpc timeout issue because the
> import process is taking longer than 5m. And another issue where we have
> so many files after a bulk import that we have had to bump the
> tserver.scan.files.open.max to 1K.
>
> Here are some other configs that we have been toying with:
> - master.fate.threadpool.size: 20
> - master.bulk.threadpool.size: 20
> - master.bulk.timeout: 20m
> - tserver.bulk.process.threads: 20
> - tserver.bulk.assign.threads: 20
> - tserver.bulk.timeout: 20m
> - tserver.compaction.major.concurrent.max: 20
> - tserver.scan.files.open.max: 1200
> - tserver.server.threads.minimum: 64
> - table.file.max: 64
> - table.compaction.major.ratio: 20
>
> (HDFS)
> - dfs.namenode.handler.count: 100
> - dfs.datanode.handler.count: 50
>
> Just want to get any quick ideas for performing bulk ingest at scale.
> Thanks guys
>
> p.s. This is on Accumulo 1.6.5

Mime
View raw message