hudi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinoth Chandar <vin...@apache.org>
Subject Re: KEEP_LATEST_COMMIT vs KEEP_LATEST_VERSION
Date Tue, 11 Jun 2019 16:20:02 GMT
Cool. So, cleaning policy determines how we clean up older versions of file
groups (simplistically old parquet and log files), to bound storage growth,

KEEP_LATEST_COMMITS (default) : Retains (does not delete) any file (slice)
that was touched in the last X commits. The idea here is that you are able
to pull the incremental changes worth upto X commits.
KEEP_LATEST_FILE_VERSIONS :  If you are not interested in incremental pull
at all, you can choose to just retain X files (slices) per file group (i.e
files that share same prefix) instead. This could result in fewer files in
some cases.

In practice, we always use KEEP_LATEST_COMMITS, I keep thinking about
starting a discussion to retire LATEST_FILE_VERSIONS actually..

Hope that helps.

On Tue, Jun 11, 2019 at 9:05 AM Gary Li <yanjia.gary.li@gmail.com> wrote:

> Hello Vinoth,
>
> Yes, that’s what I mean.
>
> Thanks
> Gary
>
> On Tue, Jun 11, 2019 at 9:03 AM Vinoth Chandar <vinoth@apache.org> wrote:
>
> > Hi Gary,
> >
> > Do  you mean cleaning policy?  KEEP_LATEST_FILE_VERSIONS vs
> >  KEEP_LATEST_COMMITS ?
> >
> > Thanks
> > VInoth
> >
> > On Mon, Jun 10, 2019 at 9:47 PM Gary Li <yanjia.gary.li@gmail.com>
> wrote:
> >
> > > Hello,
> > >
> > > I am a little confused when I was looking at the compaction policy.
> What
> > is
> > > the difference between KEEP_LATEST_COMMIT vs KEEP_LATEST_VERSION? What
> is
> > > the exact definition of "COMMIT" and "VERSION"?
> > >
> > > Thanks,
> > > Gary
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message