From "Seidl, Ed" <>
Subject Re: appending data to tables (partitioning?)
Date Fri, 20 Jul 2012 20:34:01 GMT

On 7/20/12 1:23 PM, "Billie J Rinaldi" <> wrote:

>One thing you should think about is making it so that you only have one
>file per tablet, i.e. that you create a new split point for every new
>file that you import.  This should be doable if your files are pretty
>large and you don't end up having too many tablets.  If there is only one
>file per tablet, it won't compact unless you tell it to.

Awesome...that's exactly the case...I'll have one file per tablet, and all
the files should be more-or-less the same size (within 10% or so), on the
order of a gigabyte each.  Thanks for the split point tip...I hadn't
thought of that.  This should do exactly what I want.


>If you want to have multiple files per tablet, there are a number of
>parameters you should think about.  However, you should make sure that
>you don't have too many files per tablet because 1) query performance
>will suffer and 2) there is a limit to the number of files that a tablet
>server will open.  The limit to open files is adjustable.  For scan, it
>defaults to 100 files for all the tablets, and for major compaction it
>defaults to 10 files per tablet (but the compaction can be performed in
>To change the compaction criteria, adjust table.file.max and
>table.compaction.major.ratio.  table.file.max is the maximum number of
>files that a tablet can have.  If a tablet has more files than this, it
>will compact.  table.compaction.major.ratio governs when compaction
>occurs when a tablet has fewer files than the maximum.  It also governs
>which files are compacted together in either case.  Raising the ratio
>will make compactions happen less.  If table.file.max is larger than the
>number of files you expect to have per tablet, setting
>table.compaction.major.ratio to the same value as table.file.max should
>keep it from compacting unless there is high variation in your file
>sizes.  A set of files is compacted into a single file if the size of the
>largest file times the ratio is <= the sum of the sizes of the files.

