hudi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Li <yanjia.gary...@gmail.com>
Subject Re: KEEP_LATEST_COMMIT vs KEEP_LATEST_VERSION
Date Wed, 12 Jun 2019 04:53:56 GMT
Thanks, Vinoth. That's very helpful.

When I was using data consumers that don't support hoodie format, I have to
use KEEP_LATEST_FILE_VERSIONS and CLEANER_FILE_VERSIONS_RETAINED_PROP = "1"
to keep the parquet files clean, as discussed in
https://github.com/apache/incubator-hudi/issues/715. When I use
KEEP_LATEST_COMMITS with hoodie.cleaner.commits.retained = "1", I will
still have two versions of parquet files.

Comparing with running batch jobs, this way actually make my situation much
better. So I'd recommend not to retire KEEP_LATEST_FILE_VERSIONS and some
people might find it useful as I do.

Thanks!
Gary


On Tue, Jun 11, 2019 at 9:20 AM Vinoth Chandar <vinoth@apache.org> wrote:

> Cool. So, cleaning policy determines how we clean up older versions of file
> groups (simplistically old parquet and log files), to bound storage growth,
>
> KEEP_LATEST_COMMITS (default) : Retains (does not delete) any file (slice)
> that was touched in the last X commits. The idea here is that you are able
> to pull the incremental changes worth upto X commits.
> KEEP_LATEST_FILE_VERSIONS :  If you are not interested in incremental pull
> at all, you can choose to just retain X files (slices) per file group (i.e
> files that share same prefix) instead. This could result in fewer files in
> some cases.
>
> In practice, we always use KEEP_LATEST_COMMITS, I keep thinking about
> starting a discussion to retire LATEST_FILE_VERSIONS actually..
>
> Hope that helps.
>
> On Tue, Jun 11, 2019 at 9:05 AM Gary Li <yanjia.gary.li@gmail.com> wrote:
>
> > Hello Vinoth,
> >
> > Yes, that’s what I mean.
> >
> > Thanks
> > Gary
> >
> > On Tue, Jun 11, 2019 at 9:03 AM Vinoth Chandar <vinoth@apache.org>
> wrote:
> >
> > > Hi Gary,
> > >
> > > Do  you mean cleaning policy?  KEEP_LATEST_FILE_VERSIONS vs
> > >  KEEP_LATEST_COMMITS ?
> > >
> > > Thanks
> > > VInoth
> > >
> > > On Mon, Jun 10, 2019 at 9:47 PM Gary Li <yanjia.gary.li@gmail.com>
> > wrote:
> > >
> > > > Hello,
> > > >
> > > > I am a little confused when I was looking at the compaction policy.
> > What
> > > is
> > > > the difference between KEEP_LATEST_COMMIT vs KEEP_LATEST_VERSION?
> What
> > is
> > > > the exact definition of "COMMIT" and "VERSION"?
> > > >
> > > > Thanks,
> > > > Gary
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message