accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Drob <>
Subject Re: Unbalanced tablets or extra rfiles
Date Tue, 07 Jun 2016 21:18:23 GMT
1) Is your Accumulo Garbage Collector process running? It will delete
un-referenced files.
2) I've heard it said that 200 tablets per tserver is the sweet spot, but
it depends a lot on your read and write patterns.

On Tue, Jun 7, 2016 at 4:03 PM, Andrew Hulbert <> wrote:

> Hi all,
> A few questions on behavior if you have any time...
> 1. When looking in accumulo's HDFS directories I'm seeing a situation
> where "tablets" aka "directories" for a table have more than the default 1G
> split threshold worth of rfiles in them. In one large instance, we have
> 400G worth of rfiles in the default_tablet directory (a mix of A, C, and
> F-type rfiles). We took one of these tables and compacted it and now there
> are appropriately ~1G worth of files in HDFS. On an unrelated table we have
> tablets with 100+G of bulk imported rfiles in the tablet's HDFS directory.
> These seems to be common across multiple clouds. All the ingest is done
> via batch writing. Is anyone aware of why this would happen or if it is
> even important? Perhaps these are leftover rfiles from some process. Their
> timestamps cover large date ranges.
> 2. There's been some discussion on the number of files per tserver for
> efficiency. Are there any limits on the size of rfiles for efficiency? For
> instance, I assume that compacting all the files into a single rfile per 1G
> split is more efficient bc it avoids merging (but maybe decreases
> concurrency). However, would it be better to have 500 tablets per node on a
> table with 1G splits versus having 50 tablets with 10G splits. Assuming
> HDFS and Accumulo don't mind 10G files!
> 3. Is there any way to force idle tablets to actually major compact other
> than the shell? Seems like it never happens.
> Thanks!
> Andrew

View raw message